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THERAPEUTIC AND DIAGNOSTIC AGENTS 

FIELD OF THE INVENTION 

5 The present invention relates generally to therapeutic and diagnostic agents. More particularly, 
the present invention provides therapeutic molecules capable of modulating signal transduction 
such as but not limited to cytokine-mediated signal transduction. The molecules of the present 
invention are useftJ, therefore, in modulating cellular responsiveness to cytokines as well as other 
mediators of signal transduction such as endogenous or exogenous molecules, antigens, microbes 

10 and microbial products, vir\ises or components thereof, ions, hormones and parasites. 

Bibliographic details of the publications referred to in this specification by author are collected 
at the end of the description. Sequence Identity Numbers (SEQ ID NOs.) for the nucleotide and 
amino acid sequences referred to in the specification are defined after the bibliography. A 
15 summary of the SEQ ID NOs is given in Table 1 . 

Throughout this specification and the claims which follow, unless the context requires otherwise, 
the word "comprise", or variations such as "comprises" or "comprising\ will be understood to 
imply the inclusion of a stated integer or group of integers but not the exclusion of any other 
20 integeFor group of integers. 

BACKGROUND OF THE INVENTION 

Cells continually nwnitor their environment in order to modulate physiological and biochemical 
25 processes which in turn affects future behaviour. Frequently, a cell's initial interaction with its 
surroundings occurs via receptors expressed on the plasma membrane. Activation of these 
receptors, whether through binding endogenous ligands (such as cytokines) or exogenous Ugands 
(such as antigens), triggers a triochemical cascade from the membrane through the cytoplasm to 
the nucleus, 

30 
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Of the endogenous Jigands, cytokines represent a particularly important and versatile group. 
Cytokines are proteins which regulate the survival, proliferation, differentiation and function of 
a variety of cells within the body [Nicola, 1994]. The haemopoietic cytokines have in common 
a four-alpha helical bundle structure and the vast majority interact with a structurally related 
5 family of cell surface receptors, the type I and type n cytokine receptors [Bazan, 1990; Sprang, 
1993]. In all ca$es> ligand-induced receptor aggregation appears to be a critical event in initiating 
intracellular signal transduction cascades. Some cytokines, for example growth hormone, 
erythropoietin (Epo) and granulocyte-colony-stimulating factor (G-CSF), trigger receptor 
homodimerisation, while for other cytokines, receptor heterodimerisation or heterotiimerisation 
10 is crucial In the latter cases, several cytokines share coixinion receptor subunits and on thi 

can be grouped into three subfamilies with similar patterns of intracellular activation and similar 
biological effects DHGIton, 1994], Interleukin-3 (lL-3), IL-5 and granulocyte-macrophage colony- 
stimulating factor (GM-CSF) use the common (i-receptor subunit (Pc) and each cytokine 
stimulates the production and functional activity of granulocytes and macrophages. D>2, IL-4, 
15 IL-7, IL-9, and IL-15 each use the common y-chain (yc), while IL-4 and IL-13 share an 
alternative y^hain (y^c or 11^13 receptor oc-chain). Each of these cytokines plays an important 
role in regulating acquired immunity in the lyn^hoid system. Finally, rL-6, IL-l 1, leukaemia 
inhibitory factor (LBF), oncostatin-M (OSM), ciliary neurotrophic factor (CNTF) and 
cardiotrophin (CT) share the receptor subunit gpl30. Each of these cytokines appears to be 
20 highly pleiotropic, having effects both within and outside the haemopoietic system [Nicola, 
1994]. 

In all of the above cases at least one subunit of each receptor conqjlex contains the conserved 
sequence elements, termed boxl and box2, in their cytoplasmic tails [Murakami, 1991]. Boxl 

25 is a proline-rich motif which is located more proximal to the transmembrane domain than the 
acidic box 2 element. The box-1 region serves as the binding site for a class of cytoplasmic 
tyrosine Idnases termed JAKs (Janus kinases). Ligand-induced receptor dimerisation serves to 
increase the catalytic activity of the associated JAKs through cross-phosphorylation. Activated 
JAKs then tyrosine phosphorylate several substrates, including the receptors themselves. 

30 Specific phosphotyrosine residues on the receptor then serve as docking sites for SH2-containing 
proteins, the best characterised of which are the signal transducers and activators of transcription 
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(STAT&) and the adaptor protein, she. The STATs are then phosphorylated on tyrosines, 
probably by JAKs, dissociate from the receptor and form either homodimers or heterodimers 
through the interaction of the SH2 domain of one STAT with the phosphotyrosine residue of the 
other. STAT dimers then translocate to the nucleus where they bind to specific cytokine- 
5 responsive promoters and activate transcription [Darnell, 1994; Ihle, 1995; Ihle, 1995], In a 
separate pathway, tyrosine phosphorylated she interacts with another SH2 domain-containing 
protein, Grb-2, leading ultimately to activation of members of the MAP kinase family and in turn 
transcription factors such as fos and jun [Sato, 1993; CuUer, 1993]. These pathways are not 
unique to members of the cytokine receptor family since cytokines that bind receptor tyrosine 
10 kinases also being able to activate STATs and members of the MAP kinase family [David, 1996; 
Leaman, 1996; Shual, 1993; Sato, 1993; Cutler, 1993], 

Four members of the JAK family of cytoplasmic tyrosine kinases have been described, JAKl, 
JAK2, JAK3 and TYK2, each of which binds to a specific subset of cytokine receptor subunits. 

15 Six STATs have been described (STATl through STAT6), and these too are activated by 
distinct cytokine/receptor complexes. For example, STATl appears to be functionally specific 
to the interferon system, STAT4 appears to be specific to IL-12, while STAT6 appears to be 
specific for IL-4 and IL-13. Thus, despite common activation mechanisms some degree of 
cytokine ^)ecificity may be achieved through the use of specific JAKs and STATs [Thierfelder, 

20 1996; Kaplan, 1996; Takeda, 1996; Shimoda, 1996; Meraz, 1996; Durbin, 1996], 

In addition to those described above, there are clearly other mechanisms of activation of these 
pathways. For example, the JAK/STAT pathway appears to be able to activate MAP kinases 
independent of the she-induced pathway [David, 1995] and the STATs themselves can be 
25 activated without binding to the receptor, possibly by direct interaction with JAKs [Gupta, 
1996]. Conversely, M activation of STATS may require the action of MAP kinase in addition 
to that of JAKs [David, 1995; Wen, 1995]. 



While the activation of these signalling pathways is burning better understood, little is known 



30 of the regulation of these pathways, including employment of negative or positive feedback 
loops. This is important since once a cell has begun to respond to a stimulus, it is critical that 
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the intensity and duration of the response is regulated and that signal transduction is switched 
ofL It is likewise desirable to increase the intensity of a response systemically or even locally as 
the situation requires. 

5 In work leading up to the present invention, the inventors sought to isolate negative regulators 
of signal transduction. The inventors have now identified a new femily of proteins which are 
capable of acting as regulators of signalling. The new family of proteins is defined as the 
suppressor of cytokine signalling (SOCS) famDy based on the ability of the initiaUy identified 
SOCS molecules to suppress cytokine-medi^ed signalling. It should be noted, however, that 

10 not aU menabers of the SOCS feimly need necessarily share suppressor function nor target solely 
cytokine mediated signalling. The SOCS family comprises at least three classes of protein 
molecules based on anoino acid sequence motifs located N-tenninal of a C-terminal motif called 
the SOCS box. The identification of this new family of regulatory molecules permits the 
generation of a range of effector or modulator molecules capable of modulating signal 

15 transduction and, hence, cellular responsiveness to a range of molecules including cytokines. 
The present invention, therefore, provides therapeutic and diagnostic agents based on SOCS 
proteins, derivatives, homologues, analogues and mimetics thereof as well as agonists and 
antagonists of SOCS proteins. 



20 SUMMARY OF THE INVENTION 

The present invention provides inter alia nucleic acid molecules encoding members of the SOCS 
family of proteins as well as the proteins themselves. Reference hereinafter to "SOCS" 
encompasses any or all members of the SOCS family. Specific SOCS molecules are defined 

25 numerically such as, for example, SOCSl, S0CS2 and S0CS3. The species from which the 
SOCS has been obtained may be indicated by a preface of a single letter abbreviation where "h" 
is human, "m" is murine and "r" is rat. Accordingly, "mSOCS I"is a specific SOCS firom a murine 
animal. Reference herein to "SOCS" is not to iirqjly that the protein solely suppresses 
cytokine-mediated signal transduction, as the molecule may modulate other effector-mediated 

30 signal transductions such as by hormones or other endogenous or exogenous molecules, 
antigens, microbes and microbial products, viruses or components thereof, ions, hormones and 
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parasites. The term ''modulates" encompasses up-regulation, down-regulation as well as 
maintenance of particular levels* 

One aspect of the present invention provides a nucleic acid molecule comprising a sequence of 
5 nucleotides encoding or complemeniaiy to a sequence encoding a protein or a derivative, 
homologue, analogue or mimetic thereof or a nucleotide sequence capable of hybridizing thereto 
under low stringency conditions at 42X wherein said protein comprises a SOCS box in its C- 
tenninal region 

10 Another aspect of the present invention provides a nucleic acid molecule comprising a sequence 
of nucleotides encoding or complementary to a sequence encoding a protein or a derivative, 
homologue, analogue or mimetic thereof or a nucleotide sequence capable of hybridizing thereto 
under low stringency conditions at AT^C wherein said protein comprises a SOCS box in its C- 
tenninal region and a protein:molecule interacting region. 

15 

Yet another aspect of the present invention is directed to a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizii^ thereto under low stringency conditions at 42°C wherein said protein comprises a C- 
20 terminal region and a protein:molecule interacting region located in a region N-terrainal of the 
SOCS box. 

Preferably, the proteinimolecule interacting region is a proteiniDNA or protein:protein binding 
region. 

25 

Still a further aspect of the present invention provides a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42*'C wherein said protein comprises a 
30 SOCS box in its C-terminal region and one or more of an SH2 domain, WD-40 repeats or 
ankyrin repeats N-terminal of the SOCS box. 
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Even still a further aspect of the present invention is directed to a nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
protein or a derivative, homologue, analogue or mimetic thereof or a nucleotide sequence 
capable of hybridizing thereto under low stringency conditions at 42**C wherein said protein 
5 con[5>rises a SOCS box in its C-terminal region wherein the SOCS box comprises the amino acid 
sequence: 

Xj X2 X3 Xi X5 X5 X7 Xj X^ Xio Xjj X,3 Xi4 Xi5 Xjg [Xj]„ Xi7 Xjg Xi9 X20 

^21 ^22 [^j]a ^5 -^26 ^27^28 

10 

wherein: Xj is L, I, V, A or P; 

X2 is any amino acid residue; 

X3 is P. T or S; 

X4is U I. V,M, Aor P; 
15 X5 is any amino acid; 

X5 is any amino acid; 

X7 is L, I, V, M, A, F, Y or W; 

XgisC, TorS; 

X9isR, KorH; 
20 Xio is any amino acid; 

Xii is any amino acid; 

X12 is L, I, V, M, A or P; 

Xj3 is any amino acid; 

Xj^ is any amino acid; 
25 Xtj is any amino acid; 

Xi6 is L. I, V. M, A, P, G, C, T or S; 

[XJ^is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X^ may comprise the same or different amino 
acids selected from any amino acid residue; 
30 X^isL,!, V, M, AorP; 

Xtg is any amino acid; 
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is any amino acid; 
X20L, I, V, M,AorP; 
X21 is P; 

X^is L, I, V, M, A,Por G; 
5 X23 is P or N; 

[Xjla is a sequence of n amino acids wherein n is fix>m 1 to 50 amino acids 
and wherein the sequence may comprise the same or different amino 
acids selected from any amino acid residue; 
X24 is L, I, V, M, A or P; 
10 X25 is any amino acid; 

X^g is any amino acid; 

X27 is Y or F; 

Xjg is L, I, V, M, A or P; 

15 and a protein;molecule interacting region such as but not limited to one or more of an SH2 
domain, WIMO repeats and/or ankyrin repeats N-terminal of the SOCS box. 

Another aspect of the present invention is directed to a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
20 derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridising thereto under low stringency conditions at 42*C wherein said protein exhibits die 
following characteristics: 

(i) comprises a SOCS box in its C-terminal region having the amino acid sequence: 
25 X, X^ X3 X, X5 Xfi Xy X3 X, Xjo Xu X,2 Xj3 X,,X,s X,« [XJ, X^ X^^ X,^ X^o 

X21 X23 [Xjln X24 X25 X^^ X27X2g 

wherein: X^ is L, I, V, M, A or P; 

Xj is any amino acid residue; 
30 X3 is P, T or S; 

X^is L, I, V, M, AorP; 
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Xs is any amino acid; 

is any amino acid; 
X,isL.I.V.M,A.F,YorW; 

Xg is C. T or S; 
5 X9isR,KorH; 

Xjo is any amino acid; 

X,, is any amino acid; 

Xn is L. I, V, M, A or P; 

Xi3 is any amino acid; 
10 X,4 is any amino acid; 

X,5 is any amino acid; 

Xi6 is L, I, V, M, A, P, G, C, T or S; 

[XJ. is a sequence of n amino adds wherein n is from 1 to 50 amino acids 
and wherein the sequence X, may comprise the same or different amino 
15 acids selected firom any amino acid residue; 

X„isL.I, V,M, AorP; 
X,8 is any amino acid; 
X,9 is any amino acid; 
X,oUI.V.M,AorP; 

20 _ XjiisP; 

~ X22 is L, I. V. M, A, P or G; 

XjjisPorN; 

(Xj)„ is a sequence of n anuno acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
25 acids selected from any amino acid residue; 

X24isL,I, V, M, Aor P; 
Xjs is any amino acid; 
X26 is any amino acid; 
X27 is YorF; 

30 X28isL,I,V.M,AorP;and 

(ii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
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protein:molecuIe interacting domain in a region N-terminai of the SOCS box. 



Preferably, the SOCS molecules modulate signal transduction such as from a cytokine or 
hormone or other endogenous or exogenous molecule, a microbe or microbial product, an 
5 antigen or a parasite. 

More preferably, the SOCS molecule modulate cytokine mediated signal transduction- 

Still another aspect of the present invention comprises a nucleic acid molecule comprising a 
10 sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or comprises a nucleotide sequence capable 
of hybridizing thereto under tow stringency conditions at AT'C wherein said protein exhibits the 
following characteristics; 

(i) is capable of modulating signal transduction; 
15 (ii) comprises a SOCS box in its C-terminal region having the amino acid sequence: 



Xj X2 X3 X4 Xj X^ X7 Xg X9 Xio X12 Xi3 Xi4Xi5 Xi6 Kin ^ig X,9 X20 

X22 X23 PQId X25 X25 X27X2g 



20 



25 



30 



wherein: 



X, is L, I. V, M, Aor P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X^is L, I, V, M, Aor P; 

X5 is any amino acid; 

Xg is any amino acid; 

X^isL, I, V, M, A, F,YorW; 

Xg is C,TorS; 

X^isR^Kor H; 

XjQ is any amino acid; 

Xu is any amino acid; 

Xn is L, I, V, M, AorP; 
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is any amino acid; 

is any amino acid; 
Xys is ^y amino acid; 
X,, is L, I, V, M, A. G. C T or S; 
5 [XJ„ is a sequeiK:e of nairino acids wherein n is from I to 50 amino acids 

and wherein the sequence may comprise the same or different amino 
acids selected from any amino acid residue; 
XnisL,!, V, M. Aor P; 

is any amino acid; 
IQ X,9 is any amino acid; 

X2oL.I,V, M,AorP; 
X21 isP; 

XjjisL, I, V, M, A,PorG; 
X23 is P or N; 

j5 [Xp^ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X^^is L, I, V, M,Aor P; 

X25 is any amino acid; 
20 X26 is any amino acid; 

X27 is Y or F; 

X28 is L, I, V, M, A or P; and 

(iii) comprises at ksast one of a SH2 domain, WD^O repeats and/or ankyrin repeats or other 
25 proteinrmolecule interacting domain in a region N-terminal of the SOCS box. 

Prefbrably> the signal transduction is mediated by a cytokine such as one or more of EPO, TPO, 
G-CSF, GM-CSE IL-3, 11^2, IL^, IL-7, IL-13, IL-6. LIF, IL-12, IFNa, TNFoc, IL-l and/or 
M-CSF, 

30 

Preferably, the signal transdxiction is mediated by one or n»re of Interleukin 6 (IL.6), Leukaemia 
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Inhibitory Factor (LIF), Oncostatin M (OSM), Interferon (IFN)-a and/or thrombopoietin. 
Preferably, the signal transduction is mediated by IL-6. 

5 Farticulariy preferred nucleic acid molecules comprise nucleotide sequences substantially set 
forth in SEQ ID NO:3 (mS<X:SI), SEQ ID N0:5 (inS0CS2), SEQ ID NO:7 (mS0CS3), SEQ 
ID N0:9 (hSOCSl), SEQ ID NO: 11 (rSOCSl), SEQ ID NO: 13 (mSOCS4), SEQ ID NO: 15 
and SEQ ED NO: 16 (hSOCS4), SEQ ID NO: 17 (xoSOCS5\ SEQ ID NO: 19 (hS0CS5), SEQ 
ID NO:20 (mSOCS6), SEQ ID NO:22 and SEQ ED NO:23 (hS0CS6), SEQ ID NO:24 

10 (mSOCS7), SEQ ID NO:26 and SEQ ID NO:27 (hS0CS7), SEQ ID NO:28 (mSOCSS), SEQ 
ED NO:30 (mSOCS9), SEQ ID NO:3i (hSOCS9), SEQ ID NO:32 (mSOCSlO), SEQ ID NO:33 
and SEQ ID NO:34 (hSOCSlO), SEQ ID NO:35 (hSOCSll), SEQ ID NO:37 (mS0CS12), 
SEQ ID NO:38 and SEQ ID NO:39 (hSOCS 12), SEQ ID NO:40 (mSOCS 13), SEQ ID NO:42 
(hS0CS13), SEQ ID NO: 43 (mSOCS14)» SEQ ID NO:45 (mSOCSlS) and SEQ ID NO:47 

15 (hSOCS 15) or a nucleotide sequence having at least about 15% similarity ro all or a region of 
any of the listed sequences or a nucleotide add molecule capable of hybridizing to any one of the 
listed sequences under low stringency conditions at 42*C. 

Another aspect of the present invention relates to a protein or a derivative, homologue, analogue 
20 or mimetic thereof comprising a SOCS box in its C-terminal region. 

Yet another aspect of the present invention is directed to a protein or a derivative, homologue, 
analogue or mimetic thereof comprising a SOCS box in its C-terminal region and a 
protein:molecule interacting region. 

25 

Even yet another aspect of the present invention provides a protein or a derivative, homologue, 
analogue or mimetic thereof comprising an interacting region located in a region N-terminal of 
the SOCS box. 

30 Preferably, the protejn:roolecu]e interacting region is a protein:DNA or a proteiniprotein binding 
region. 
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Another aspect of the present invention contemplates a protein or a derivative, homologue, 
analogue or mimetic thereof con5)rising a SOCS box in its C-terminal region and a SH2 domain, 
WD-40 repeats or anfcyrin repeats N-tenninaJ of the SOCS box. 

5 Still yet another aspect of the present invention provides a protein or a derivative, homologue> 
analogue or mimetic thereof exhibiting the following characteristics: 

(i) comprises a SOCS box in its C-terminal region having the amino acid sequence: 

10 Xi X2 X3 X4 X5 X7 Xg X9 X^() X|i X12 Xj3 XJ4X15 X|^ [XJj, Xj7 Xjg X|9 X20 

X21 X22 X23 [Xj]g X25 Xj^ X27X28 

wherein: Xj is L, I, V, M, A or P; 

X2 is any amino add residue; 
15 X3 is P, T or S; 

X,is L, I, V, M, AorP; 

X5 is any amino acid; 

Xg is any amino acid; 

X^isL,!, V,M, A,F, YorW; 
20 Xg is C, T or S; 

X^is R,KorH; 

Xio is any amino acid; 

X12 is any amino acid; 

Xijism, V,M, AorP; 
25 Xi3 is any amino acid; 

Xi4 is any amino acid; 

Xi5 is any amino acid; 

X,, is L, I, V, M, A, P, G, C, T or S; 

[XJo is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
30 and wherein the sequence X^ may comprise the same or different amino 

acids selected from any amino acid residue; 
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XjyisL. I, V, M, AorP; 
Xis is any amino acid; 

is any amino acid; 
X30L, I, V, M, AorP; 
5 Xjj is P; 

X22is L,I, V, M, A.PorG; 
XjjisPorN; 

(X|]j, is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
10 acids selected from any amino acid residue; 

X24is L, I, V, AorP; 
X25 is any amino acid; 
X26 is any amino acid; 
X^yis Yor F; 

15 X2g is L, I, V, M, A or P; and 

(ii) coin>rises a£ least one of a SH2 domain^ WD-40 repeats and/or ankyrin repeats or other 
protein:moiecule interacting domain in a region N-terminal of the SOCS box. 

20 Preferably, the proteins modulate signal transduction such as cytokine-mediated signal 
transduction. 

Prefeired cytokines are EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, E--4, IL-7, EL- 13, IL-6, UDF, 
IL.12, IFNy, TNFa, IL-1 and/or M-CSF. 

25 

A particularly preferred cytokine is IL-6. 

Even yet another aspect of the present invention provides a protein or derivative, homologue, 
analogue or mimetic thereof exhibiting the following characteristics; 
30 (i) is capable of modulating signal transduction such as cytokine-mediated signal 
transduction; 
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(ii) comprises a SOCS box in its C- terminal region having the amino acid sequence: 
Xj X2 X3 X4 X5 X7 Xg X9 Xjo Xii X12 X13 X|4 Xi5 X|^ [Xj]„ Xj7 X|g Xj^ X20 

X21 X22 X23 [Xj]n X24 X25 X2^ X27X2g 

wherein: Xi is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X^isL,!, V,M,AorP; 

X5 is any amino acid; 

X^ is any amino acid; 

X7 is L, I, V, M, A, Y or W; 

Xg is C, T or S; 

X^is R,KorH; 

XiQ is any amino acid; 

Xii is any amino acid; 

Xj2i5L,I, V,M, AorP; 

Xi3 is any amino acid; 

Xu is any amino acid; 

Xj5 is any amino acid; 

X,^ is L, I, V, M, A, P, G, C, T or S; 

(XJ^ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

Xi7is L, I, V, M, A or P; 

X(g is any amino acid; 

Xi9 is any amino acid; 

X2oL,I.V,M,AorP; 

X21 is P; 

X,2isL,I, V, M,A,PorG; 
XjjisPorN; 
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[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence may comprise the same or different amino 
acids selected from any amino acid residue; 
Xu is L, I, V, M, AorP; 
5 X25 is any amino acid; 

X26 is ^y amino acid; 
X27 is Y or F; 

is L, I, V, M, A or P; and 

1 0 (iii) con^prises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
protein-molecule interacting domain in a region N-terminal of the SOCS box. 

Particularly preferred SOCS proteins comprise an amino acid sequence substantially as set forth 
in SEQ ID NO:4 (mSOCSl), SEQ ID NO:6 (mSOCS2), SEQ ID NO:8 (mSOCS3). SEQ ID 
15 NO:10 (hSOCSl), SEQ ID N0:12 (rSOCSl), SEQ ID NO:14 (mS0CS4), SEQ ID NO:18 
(mSOCSS). SEQ ID N0:2I (mSOCS6), SEQ ID NO:25 (mSOCS7). SEQ ID NO:29 
(mSOCSS), SEQ ID NO:36 (hSOCSlI), SEQ ID NO:41 (mS0CS13), SEQ ID NO:44 
(mSOCS14), SEQ ID NO:46 (mSOCSlS) and SEQ ID NO:48 (hSOCSlS) or an amino acid 
sequence having at least 15% similarity to all or a region of any one of the listed sequences, 

20 

Another aspect of die present invention contemplates a method of modulating levels of a SOCS 
protein in a cell said method comprising contacting a cell containing a SOCS gene with an 
effective amount of a modulator of SOCS gene expression or SOCS protein activity for a time 
and under conditions sufficient to modulate levels of said SOCS protein, 

25 

A related aspect of the present invention provides a method of modulating signal transduction 
in a cell containing a SOCS gene comprising contacting said cell with an effective amount of a 
modulator of SOCS gene expression or SOCS protein activity for a time sufficient to modulate 
signal transduction, 

30 

Yet a further related aspect of the present invention is directed to a method of influencing 
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interaction between cells wherein at least one cell carries a SOCS gene, said method comprising 
contacting the cell canying the SOCS gene with an effective amount of a modulator of SOCS 
gene expression or SOCS protein activity for a time sufficient to modulate signal transduction. 

5 In accordance with the present invention, n in \XX [Xjl^ may, in addition from being 1-50, 
be from 1-30, 1-20, 1-10 and 1-5. 

A summary of the SEQ ID NOs referred to in the subject specification is given in Table 1 . 



10 
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TABLE 1 

SUMMARY OF SEQUENCE roENTITY NUMBERS 



SEQUENCE SEQ ID NO. 

PGR Primer 1 

PGR Primer 2 

Mouse SOCSl (nucleotide) 3 

Mouse SOGSl (amino acid) 4 

Mouse SOCS2 (nucleotide) 5 

Mouse SOGS2 (amino acid) 6 

Mouse SOCS3 (nucleotide) 7 

Mouse SOCS3 (amino acid) 8 

Human SOCS 1 (nucleotide) 9 

Human SOCS 1 (amino acid) 10 

Rat SOCSl (nucleotide) 1 1 

Rat SOCSl (amino acid) 12 

nucleotide sequence of murine S0CS4 13 

amino acid sequence of murine SOCS4 14 

nucleotide sequence of SOCS4 cDNA human contig 4. 1 15 

nucleotide sequence of S0CS4 cDNA human contig 4.2 16 

nucleotide sequence of murine SOCS5 17 

amino acid sequence of murine SOCS5 1 8 

nucleotide sequence of human S 0CS5 19 

nucleotide sequence of murine S0CS6 20 

amino acid of murine SOCS 6 21 

nucleotide sequence of human S0CS6 contig h6. 1 22 

nucleotide sequence of human S0CS6 contig h6.2 23 

nucleotide sequence of murine S0GS7 24 
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amino acid sequence of murine S0CS7 
nucleotide sequence of human S0CS7 contig h7.1 
nucleotide sequence of human SOCS7 contig 17.2 
nucleotide sequence of murine S0CS8 
amino acid sequence of murine SOCS 8 
nucleotide sequence of murine S0CS9 
nucleotide sequence of human S0CS9 
nucleotide sequence of murine SOCS 10 
nucleotide sequence of human SOCS 10 contig hlO.l 
nucleotide sequence of human SOCS 10 contig hlO.2 
nucleotide sequence of human SOCS 1 1 

anruno acid sequence of human SOCS 1 1 

nucleotide sequence of mouse SOCS 12 

nucleotide sequence of human SOCS 12 contig hl2. 1 

nucleotide sequence of human SOCS 12 contig hi 2.2 

nucleotide sequence of murine SOCS 13 

amino acid sequence of murine SOCS 13 

nucleotide sequence of human SOCS 1 3 cDNA contig h 1 3. 1 

nucleotide sequence of murine SOCS 14 cDNA 

amino acid sequence of murine SOCS 14 

nucleotide sequence of murine SOCS 15 cDNA 

amino add sequence of murine SOCS 15 

nucleotide sequence of human SOCS 1 5 

amino acid sequence of human SOCS 15 



19 



Single and three leiier abbreviations are used to denote annno acid residues and these are 
summarized in Table 2. 



TABLE 2 

5 ____ ■ ^ 

Amino Acid Three-letter One-letter 

Abbreviation Symbol 





Alanine 


Ala 


A 




10 Ar^nine 


Arg 


R 




Asparagine 


Asn 


N 




Aspartic acid 




D 




Cysteine 


Cvs 


C 




Glutamine 


Gin 


Q 




15 Glutamic acid 




E 


21 


Glycine 


Gly 


G 




Histidine 


His 


TI 
XI 




Isoleucine 


He 


I 




Leucine 


Leu 


L 




20 Lysii^ __ 


Lys 


K 




Methionine 


Met 


M 




Phenylalanine 


Phe 


F 




Proline 


Pro 


P 




Serine 


Ser 


S 




25 Threonine 


Thr 


T 




Tryptophan 


Tip 


W 




Tyrosine 


Tyr 


Y 




Valine 


Val 


V 




Any residue 


Xaa 


X 
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BRIEF DESCRIPTION OF THE DRAWINGS 

In some of the Figures, abbreviations are used to denote SOCS proteins with certain binding 
motife, SOCS proteins which contain WD-40 repeats are referred to as WSB 1-WSB4. SOCS 
5 proteins with ankyrin repeats are referred to as ASB1-ASB3. 

Figure 1 is a diagranomatic representation showing generation of an IL-6-unresponsive Ml clone 
by retroviral infection. The RUFneo retrovirus, showing the position of landmark restriction 
endonuclease cleavage sites, the 4A2 cDNA insert and the position of PCR primer sequences. 

10 

Figure 2 is a photographic representation of Southern and Northern analysis. (Left and Middle 
Panels) Southern blot analysis of genomic DNA from clone 4A2 and a control infected Ml clone, 
DNA was digested with BamH I, to reveal the number of retroviruses carried by each clone, and 
Sac I, to estimate the size of the retroviral cDNA insert. Left panel; probed with neo. Right 
15 panel; probed with the Xho I-digested 4A2 PCR product. (Right Panel) . Northern blot analysis 
of total RNA from clone 4A2 and a control infected Ml clone, probed with the Xho I-digested 
4A2 PCR product. The two bands represent unspliced and spliced retroviral transcripts, 
resulting from splice donor and acceptor sites in the retroviral genome, 

20 Figure Sis a representation of the nucleotide sequence and structure of the SOCSl gene. A. 
The genomic context of SOCSl in illation to the protamine gene cluster on murine chromosome 
16, The accession number of this locus is MMPRMGNS (direct submission; G. Schlueter, 1995) 
for the mouse and BTPRMTNP2 for the rat (direct submission; G. Schlueter, 1996). B. The 
nucfeotide sequence of the SOCSl cDNA and deduced amino acid sequence. Conventional one 

25 letter abbreviations are used for the amino acid sequence and the asterisk indicates the stop 
codon. The polyadenylation signal sequence is underlined. The coding region is shown in 
uppercase and the untranslated region is shown in lower case. 

Figure 4 is a graphical representation of cell differentiation in the presence of cytokines. Semi- 
30 soKd agar cultures of parental Ml cells (Ml and Ml .mpl) and Ml cells expressing SOCS 1 (4A2 
and MLmpLSOCSl), were used and the percentage of colonies which differentiated in response 
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to a titration of 1 mg/ni IL-6 (•), 100 ng/ml UF (0), 1 mgfnd OSM (□), 100 ng/ml IFN-y ( 
500 ng/ml TPO (•), or 3x10*^ M dexamethasone (*) determined. 

Figure 5 is a photographic representation of cytospins of liquid cultures of parental Ml cells 
5 (Ml and MLmpl) ) and Ml cells expressing SOCSl (4A2 and MLmpLSOCSl) cultured for 4 
days in the presence of 10 ng/ml IL-6 or saline. Unlike parental Ml cells, morphological features 
consistent with macrophage differentiation are not observed in Ml cells constitutively expressing 
SOCSl (4A2 and Ml.mpLSOCSl) when cultured in IL-6. 

10 Figure 6 is a photographic representation showing inhibition of phosphorylation of signalling 
molecules by SOCSL Parental Ml cells (Ml and MLmpl) and Ml cells expressing SOCSl 
(4A2 and MLn^J-SOCSl) were incubated in the absence (-) or presence (+) of 10 ng/ml of lL-6 
for 4 minutes ar 37° C . Cells were then lysed and extracts were either immunopreciptated using 
anti-mouse gpl30 antibody prior to SDS-PAGE (two upper panels) or were electrophoresed 

15 directly (two lower panels). Gels were blotted and the filters were then probed with anti- 
phosphotyrosine (upper panel), anti-gpl30 antibody (second top panel), anti-phospho-STAT3 
(second bottom panel) or anti-STAT3 (lower panel). Blots were visualised using peroxidase- 
conjugated secondary antibodies and Enhanced Chemiluminescence (ECL) reagents. 

20 Figure 7 is a representation of protein extracts prepared from (A) Ml cells or Ml cells 
expressing SOCSl (4A2) and (B) MLmpl cells or Ml.mpLSOCSl cells incubated for 10 min 
at 37'C in 10 ml serum-fnee DME contaioing either saline, 100 ng/ml IL-6 or 100 ng/ml IFN-y- 
The binding reactions contained 4-6 ^g protein (constant within a given experiment), 5 ng ^P- 
labelled in67 oligonucleotide encoding the high affmity SIF (c-sij- inducible factor) binding site, 

25 and 800 ng sonicated salmon sperm DNA, For certain experiments, protein samples were 
preincubated with an excess of unlabelled m67 oligonucleotide, or antibodies specific for either 
STATl orSTAT3, 

Figure 8 is a photographic representation of Northern hybridisation. Mice were iajected 
30 intravenously with 2 fzg and after various periods of time, the livers were removed and polyA+ 
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inRNA was purified. Ml cells were stimulated for various lengths of time with 500 ng/ml of IL- 
6, after which poIyA+ mRNA was isolated. mRNA was fractionated by eleccrophoresis and 
immobilized on nylon filters. Northern blots were prehybridized, hybridized with random-primed 
"P-labelled SOCSl or GAPDH DNA fragments, washed and exposed to film overnight. 

5 

Figure 9 is a representation of a comparison of the amino acid sequences of SOCS 1, S0CS2, 
SOCS3 and CIS. Alignment of the predicted amino acid sequence of mouse (mm), human (hs) 
and rat (rr) SOCSl, SOCS2, SOCS3 and CIS. Those residues shaded are conserved in three or 
four mouse SOCS family members. The 5112 domain is boxed in solid lines, while the SOCS box 
10 is bounded by double lines. 

Hgure 10 is a photographic representation showing the phenotype of IL-6 unresponsive Ml cell 
clone, 4A2. Colonies of parental Ml cells (left panel) and clone 4A2 (right panel) cultured in 
semi-solid agar for 7 days in saline or 100 ng/ml IL-6. 

15 

Figure 11 is a photographic representation showing expression of mRNA for SOCS family 
members in vitro and in vivo. 

(A) Northern analysis of mRNA from a range of mouse organs showing constitutive 
expression of SOCS family members in a limited number of tissues. 
20 (B) _Norther analysis of mRNA from liver and Ml cells showing induction of expression of 
SOCS fannily members following exposure to IL-6, 

(C) Reverse transcriptase PCR analysis of mRNA ftom bone marrow showing induction of 
expression of SOCS family members by a range of cytokines. 

25 Figure 12 is a photographic representation showing SOCSl suppresses the phosphorylation and 
activation of gpl30 and STAT-3. 

(A) Western blots of extracts fronn parental Ml cells (Ml and Ml.mpl) and Ml cells 
expressing SOCSl (4A2 and MLn?)lSOCSl) stimulated with (+) or without (-) 100 ng/ml IL-6, 
Top: Extracts immtmoprecipitated with antu-gpl30 (agplSO) and immunoblotted with anti- 
30 phosphotyrosine (aPY-STAT3), or for STAT3 (aSTAT3) to demonstrate equal loading of 
protein. The molecular weights of the bands are shown on the right. 
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(B) EMSA of Ml.mpl and MLmpLSOCSl ceUs stimulated with (+) and without (-) 100 
ng/ml 11^6 or 100 ng/ml IFNy. The DNA-binding complexes SIF A. B, and C are indicated at 
the left- 

5 Figure 13 is a representation of a comparison of the amino acid sequence of the SOCS proteins 
(A) Schematic representation of structures of SOCS proteins including proteins which contain 
WD-40 repeats (WSB) and ankyrin repeats (ASB). (B) Alignment of N-terminal regions of 
SCX:S proteins. (Q Alignment of the SH2 domains of CIS, SOCSl, 2, 3, 5, 9, 11 and 14. (D) 
AligmnentoftheWD^O repeats of SOCS4,SOCS6,SOCS13^^ (E) Alignment of 

10 the ankyrin repeats of SOCS7 and SOCSIO. (F) Alignment of the regions between SH2, WD-40 
and ankyrin repeats and the SOCS box. (G) Alignment of the SOCS box. In each case the 
conventional one letter abbreviations for amino acids are used, with X denoting residues of 
uncertain identity and OOO denoting the beginning and the end of contigs. Amino acid 
sequence obtained ft^om conceptual translation of nucleic acid sequence derived from isolated 

15 cDNAs is shown in upper case while amino acid sequence obtained by conceptual translation of 
ESTs is shown in lower case and is ^proximate only. Conserved residues, defined as (LIVMA), 
CFYW), (DE), (QN), (C, S, T), (KRH), (PG) are shaded in the SH2 domain, WD-40 repeats, 
ankyrin repeats and the SOCS box. For the alignment of SH2 domains, WD-40 repeats and 
ankyrin repeats a consensus sequence is shown above. In each case this has been derived from 

20 examination of a large and diverse set of domains (Neer et al 1994; Bork, 1993). 

li^igur^s 14(A) and (B) are photographic representations showing analysis of mRNA expression 
of mouse SOCSl and SOCS5 and SOCS contaming a WIMO repeat (WSB2) and ankyrin 
repeats (ASBl). 

25 

HguiB IS is a representation showing the nucleotide sequence of the nxjuse SOCS4 cDNA- The 
nucleotides encoding the mature coding regton from the predicted ATG "start" codon to the stop 
codon is shown in upper case, while the predicted 5' and 3' untranslated regions are shown in 
lower case. The relationship of mouse cDNA sequence to mouse and human EST contigs is 
30 illustrated in Figure 17, 
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Ftgure 16 is a representation showing the predicted amino acid sequence of the mouse SOCS4 
protein, derived from the nucleotide sequence in Figure 15. The SOCS box, which also shown 
in Figure 13, is underlined. 

5 Figure 18 is a representation showing the nucleotide sequence of human SOCS4 cDNA contigs 
h4.1 and h4.2, derived from analysis of ESTs listed in Table 4.1. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 17. 

Figure 19 is a diagrammatic representation showing the relationship of mouse SOCS5 genomic 
10 (57-2) and cDNA (5-3-2) clones to contigs derived from analysis of mouse ESTs (Table 5.1) and 
human cDNA clone (5-94-2) and ESTs (Table 5.2). TTie nucleotide sequence of the mouse 
SOCS5 contig is shown in Figure 20, with the sequence of human S0CS5 contig (h5.1) being 
shown in Figure 21. The deduced amino acid sequence of mouse S0CS5 is shown in Figure 
20B. The structure of the protein is shown schematically, with the SIC domain indicated by 
15 ( ) and the SOCS box by ( ). The putative 5' and 3' translated regions are shown by the thin 
solid line. 

Figure 20A is a representation showing the nucleotide sequence of the mouse SOCS5 derived 
from analysis of genomic and cDNA clones. The nucleotides encoding the mature coding region 
20 from jth^L predicted ATG "start" codon to the stop codon is shown in upper case, while the 
predicted 5' and 3' untranslated regions are shown in lower case. The relationship of mouse 
cDNA sequence to mouse and human EST contigs is illustrated in Figure 19. 

Kgure 20B is a representation of the predicted amino acid sequence of mouse S0CS5 protein, 
25 derived from the nucleotide sequence in Figure 20 A. The SOCS box, which also shown in 
Figure 13 is underlined. 

Figure 21 is a representation showing the nucleotide sequence of human S0CS5 cDNA contig 
h5.1, derived from analysis of cDNA clone 5-94-2 and the ESTs listed in Table 5,2. The 
30 relationship of these contigs to the mouse cDNA sequence is Dlustrated in Figure 19. 
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Figure 22 is a diagrammatic representation showing the relationship of mouse SOCS6 cDNA 
clones (6-1 A, 6-2A, 6-5B, 6-4N, 6-18, 6-29, 6-3N and 6-5N) to coniigs derived from analysis 
of mouse ESTs (Table 6. 1) and hunsan ESTs (Table 6.2). Tbe nucleotide sequence of the mouse 
SOCS-6 contig is shown in Figure 23, with the sequence of human SOCS6 contigs (h6. 1 and 
5 h6,2) being shown in Figure 24. The deduced amino acid sequence of mouse S0CS6 is shown 
in Figure 23B. The structure of the protein is shown schematicaDy, while the WD-40 repeats 
indicated by ( ) and the SOCS box by ( ), The putative 5' and y untranslated regions are 
shown by the thin solid line. 

10 Figure 23A is a representation showing the nucleotide sequence of the mouse SOCS6 derived 
from analysis of cDNA clone 64- 10 A- 1 1. The nucleotides encoding the part of the predicted 
coding region, ending in the stop codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. The relationship of mouse cDNA sequence to 
mouse and human EST contigs is illustrated in Figure 22. 

15 

Figure 23B is a representation showing the predicted amino acid sequence of mouse SOCS6 
protein, derived from tbe nucleotide sequence in Figure 23A. The SOCS box, which also shown 
in Figure 13 is underlined. 

20 Figurje 24 is.a representation showing the nucleotide sequence of human S0CS6 cDNA contig 
h6-l, derived from analysis of cDNA clone 5-94-2 and the ESTs iistcd in Table 6.2. The 
relationship of these contigs to the mouse cDNA sequence is illustrated in Figure 22 

Figure 25is a diagrammatic representation showing the relationship of mouse SOCS7 cDNA 
25 ctone C74-10A-1 1) to contigs derived from analysis of mouse ESTs (Table 7.1) and hunaan ESTs 
(Table 7.2). The nucleotide sequence of the mouse SOCS7 contig is shown in Figvire 26 with 
the sequence of human S0CS7 contigs (h7, 1 and h7.2) being shown in Figure 27, The deduced 
amino acid sequence of mouse SOCS7 is shown in Figure 26B. The structure of the protein is 
shown schematically, with the ankyrin repeats indicated by ( ) and the SOCS box by ( ). The 
30 putative 5' and 3' untranslated regions are shown by the thin solid line in the mouse and by the 
wavy line in h7.^^Based on analysis of clones isolated to date and ESTs the 3' untranslated 
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regions of mSOCS? and hS0CS7 share little similarity. 

Figure 26A is a representation showing the nucleotide sequence of the mouse S0CS7 derived 
from analysis of cDNA clone 74-1 OA- 11. The nucleotides encoding the part of the predicted 
5 coding region, ending in the stop codon are shown in upper case, while the predicted 3* 
untranslated regions are shown in lower case. The relationship of mouse cDNA sequence to 
mouse and human EST contigs is illustrated in Figure 25. 

Figure 26B is a representation showing the predicted amino acid sequence of mouse SOCS7 
10 protein, derived from the nucleotide sequence in Figure 26 A, The SOCS box, which also shown 
in Figure 13 is underlined 

figure 27 is a representation showing the nucleotide sequence of human SOCS7 cDNA contig 
h7.1 and h7.2 derived from analysis of die ESTs listed in Table 7.2. The relationship of these 
15 contigs to the mouse cDNA sequence is illustrated in Figure 25. 

Figure 28 is a diagrammatic representation of the relationship of sequence derived from analysis 
of mouse S0CS8 ESTs (Table 8.1 and Figure 29A) to the predicted protein stnicture of mouse 
S0CS8, The deduced partial amino acid sequence of mouse SOCS8 is shown in Figure 29B. 

20 The structure of the protdn is shown schematically with the SOCS box highlighted ( ). The 
predicted 3' untranslated region is shown by the thin line. 

Figure 29A is a representation showing the partial nucleotide sequence of mouse SOCS8 cDNA 
(contig 8. 1) derived from analysis of ESTs. The nucleotides encoding the part of the predicted 
25 coding region, ending in the STOP codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. 

Figure 29B is a representation showing the partial predicted amino acid sequence of the mouse 
SOCS8 protein, derived from the nucleotide sequence in Figure 29 A. The SOCS box, which 
30 also shown in Figure 13 is underlined. 
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Figure 30 is a diagrammatic representation showing the relationship of mouse SOCS9 ESTs 
(Table 9. 1) and human S0CS9 ESTs (Table 9.2). The nucleotide sequence of the mouse S0CS9 
contig (m9,l) is shown in Figure 31, with the sequence of human SOCS9 contig (h9.1) being 
shown in Figure 32. The deduced amino acid sequence of human S0CS9 is shown 
5 schematically, with the SH2 domain indicated by ( ) and the SOCS box by ( ). The putative 3' 
untranslated region is shown by the thin solid line. 

Figure 31 is a representation showing the paitial nucleotide sequence of mouse SOCS9 cDNA 
(contig m9. 1 ), derived from analysis of the ESTs listed in Table 9. L The relationship of these 
10 contigs to the mouse cDNA sequence is illustrated in Figure 30. 

Figure 32 is a representation showing the partial nucleotide sequence of human SOCS9 cDNA 
(contig h9.1), derived from analysis of the ESTs listed in Table 9,2. Although it is clear diat 
contig h9. 1 encodes a protein with an SHE domain and a SOCS box, the quality of the sequf^ce 
15 is not high enough to derive a single unambiguous open reading frame. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 30. 

Figure 33 is a representation showing the relationship of mouse SOCS 10 cDNA clones (10-9, 
10-12> 10-23 and 10-24) to contigs derived from analysis of mouse ESTs (Table 10.1) and 

20 hunaacLESTs .(Table 10.2). The nucleotide sequence of the mouse SOCS 10 contig is shown in 
Figiire 10.2, with xbc sequence of human SOCSIO contigs (hlO.l and hlO.2) being shown in 
Figure 35. The predicted structure of the protein is shown schematically, with the ankyrin 
repeats indicated by () and the SOCS box by ( ). The putative 3 ' untranslated regions is shown 
by the thin line solid line in the nwuse and by the wavy line in hi 0.2. Based on analysis of clones 

25 isolated to date and ESTs the 3' untranslated regions of mSOCS-lO and hSOCS-lO share little 
similarity. 

Figure 34 is a representation showing the nucleotide sequence of the mouse SOCSIO derived 
from analysis of cDNA clone 10-9, 10-12, 10-23 and 10-24. The nucleotides encoding the part 
30 of the predicted coding region, ending in the stop codon are shown in upper case, while the 
predicted 3 ' untranslated regions are shown in lower case. Although it is clear that contig mlO. 1 
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encodes a protein with a series of ankyrin repeats and a SOCS box, the quality of the sequence 
is not high enough to derive a single unambiguous open reading frame. The relationship of 
mouse cDNA sequence to mouse and human EST contigs is illustrated in Figure 33. 

5 Figure 35 is a representation showing the nucleotide sequence of human SOCS 10 cDNA contig 
hlO.2 and hi 0,2 derived fixtm analysis of the ESTs listed in Table 10.2. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 33. 

Figure 36A is a representation showing the partial nucleotide sequence of the human SOCS 1 1 
cDNA derived &om analysis of ESTs listed in Table 11,1 The nucleotides encoding the mature 
10 coding region from the predicted ATG "start" codon to the stop codon is shown in upper case, 
while the predicted 5' and 3' untranslated regions are shown in lower case. The relationship of 
the partial cDNA sequence, derived from ESTs, to the predicted protein is shown in Figure 37. 

Figure 36B is a representation showing the partial pnedicted aniino acid sequence of human 
15 SOCS 11 protein, derived from the nucleotide sequence in Figure 3 6 A. The SOCS box, which 
also shown in Figure 13, is underlined. 

Figure 37 is a diagrammatic representation showing the relationship of sequence derived from 
analysis of human SOCS-1 1 ESTs (Table 1 L 1 and Rgure 36A) to the predicted protein structure 
20 of humgn SOCS 1 1 . The deduced partial amino acid sequence of hunoan SOCSl 1 is shown in 
Figure 36B. The structure of the protein is shown schematically with the SH2 domain shown 
by () and the SOCS box highlighted by ( ). The predicted 3' untranslated region is shown by 
the thin line. 

25 Figure 38 is a diagrammatic representation showing the relationship of mouse SOCS 12 cDNA 
clones (12-1) to contigs derived from analysis of mouse ESTs (Table 12.1) and human ESTs 
(Table 12.2). The nucleotide sequence of the mouse SOCS 12 contig is shown in Figure 12.2, 
with the sequence of human SOCS12 contigs (hl2.Z and hl2.2) being shown in Figure 40. The 
deduced partial amino acid sequence of mouse SOCS12 is shown in Figure 39. The structure 

30 of the protein is sown schematically, with the ankyrin repeats indicated by ( ) and the SOCS box 
by ( ). The putative 3 ' untranslated region is shown by the thin line solid line in the mouse and 
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by the wavy line in hl2.2. Based on analysis of clones isolated to date and ESTs the 3' 
untranslated regions of mS0CS12 and hS0CSI2 share little similarity. 

Figure 39 is a representation showing Che nucleotide sequence of the mouse SOCS 12 derived 
5 from analysis of cDNA clone 12-1 and the ESTs listed in Table 12.L The nucleotides encoding 
the part of the predicted coding region, including the stop codon are shown in upper case, while 
the predicted 3' untranslated region is shown in lower case. By homology with human SOCS 12 
it is clear that contig ml2. 1 encodes a protein with a series of ankyrin repeats and a SOCS box, 
the quality of the sequence is not high enough to derive a single unambiguous open reading 
10 frame. The relationship of mouse cDNA sequence to mouse and human EST contigs is 
illustrated in Figure 38. 

Figure 40 is a representation showirig the nucleotide sequence of human SOCS 12 cDNA contig 
hl2. 1 and hl2.2 derived from analysis of the ESTs listed in Table 12.2. The relationship of these 
15 contigs to the mouse cDNA sequence is illustrated in Figure 38, 

Figure 41 is a diagrammatic representation showing the relationship of contig ml 3.1 derived 
from analysis of mouse SOCS 13 cDNA clones (62-1, 62-6-7, 62-14) and mouse ESTs (Table 
13.1) to contig hl3.1 derived from analysis of human ESTs (Table 13.2). The nucleotide 
20 sequejifis of .the mouse SOCS 13 contig is shown in Figure 42, with the sequence of human 
SOCS 13 contig (hi 3.1) being shown in Figure 43. The deduced amino acid sequence of mouse 
SOCS 13 is shown in Figure 42B, The structure of the protein is shown schematically, with the 
WD-40 repeats highlighted by ( ) and the SOCS box highlighted by { ). The 3' untranslated 
region is shown by the thin line solid line. 

25 

Kgure 42A is a representation showing the nucleotide sequence of the mouse SOCS 13 derived 
from analysis of cDNA clones 62-1, 62-6-7 and 62-14. The nucleotides encoding part of the 
predicted coding region, ending in the stop codon are shown in upper case, while those encoding 
the predicted 3' untranslated regions are shown in lower case. The relationship of moxise cDNA 
30 sequence to mouse and human EST contigs is iUusd-ated in Figure 41 . 



P;VOPERVEJK«OCSl PhV - 3i/IlW 



-30- 

Figure 42B is a representation showing the predicted amino acid sequence of mouse SOCS13 
protein, derived 6rom the nucleotide sequence in Figure 42A. The SOCS box, which also shown 
in Figure 13 is underlined. 

5 Figure 43 is a representation showing the nucleotide sequence of human SOCS 1 3 cDN A contig 
hI3.1 derived from analysis of the ESTs listed in Table 13.2. The relationship of these contigs 
to the mouse cDNA sequence is illustrated in Figure 4L 

Figure 44 is a diagrammatic representation showing the relationship of a partial mouse SOCS 14 
10 cDNA clone (14-1) to contigs derived from analysis of mouse ESTs (Table 14.1). The 
nucleotide sequence of the mouse SOCS 14 contig is shown in Figure 45. The deduced partial 
amino acid sequence of mouse SOCS 14 is shown in Figure 45B. The structure of the protein 
is shown schematically, with the SU3 domain indicated by ( ) and the SOCS box by ( ). The 
putative 3 ' untranslated region is shown by the thin line. 

15 

Figure 45A is a representation showing the nucleotide sequence of the mouse SOCS 14 derived 
from analysis of genomic and cDNA clones. The nucleotides encoding the mature coding region 
from the predicted ATG "start" codon to the stop codon is shown in upper case, while the 
predicted 5' and 3' untranslated regions are shown in lower case. The relationship of mouse 
20 cDNA-sequence to mouse and human EST contigs is illustrated in Figure 44. 

Figure 45B is a representation showing the predicted amino acid sequence of mouse SOCS 14 
protein, derived from the nucleotide sequence in Figure 45B. The SOCS box, which also shown 
in Figure 13 is underlined, 

25 

Figure 46 is a diagrammatic representation showing the relationship of contig ml 5.1 derived 
from analysis of mouse BAC and naouse ESTs (Table 15.1) to contig hl5. 1 derived from analysis 
of the human BAC and human ESTs (Table 15.2), The nucleotide sequence of the mouse 
SOCS 15 contig is shown in Figure 47, with the sequence of human SOCS15 contig (blS.l) 
30 being shown in Figure 47. The deduced amino acid sequence of mouse SOCS 15 is shown in 
Figure 47B. The structure of the protein is shown schematically, with the WD-40 repeats 
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highlighted by ( ) and the SOCS box Wghlighted by ( ). The 5' and 3' untranslated region are 
shown by the thin line solid line. The introns which interrupt the coding region are shown by \ 

Figure 47A is a representation showing the nucleotide sequence covering the mouse SOCS 15 
5 gene derived from analysis the mouse BAG listed in Table 15.1. The nucleotides encoding the 
predicted coding region, beginning with the ATG and ending in the stop codon are shown in 
upper case> while those encoding the predicted 5' untranslated region, the introns and the y 
untranslated region are shown in lower case. The relationship of mouse BAG to mouse and 
human ESTs contigs is illustrated in Figure 46, 

10 

Fig^ire 47B is a representation showing the predicted amino acid sequence of mouse SOCS 15 
protein, derived from the nucleotide sequence in Figure 47A. The SOGS box, which also shown 
in Figure 13 is underlined, 

15 Figure 48A is a representation showing the nucleotide sequence covering the human SOCS 15 
gene derived fix^m analysis the human BAG listed in Table 15.2. The nucleotides encoding the 
predicted coding region, beginning with the ATG and ending in the stop codon are shown in 
upper case, while those encoding the predicted 5' untranslated region, the introns and the 3' 
untranslated region are shown in lower case. The relationship of the human BAG to mouse and 

20 human ESTs contigs is illustrated in Figure 46, 

figure 48B is a representation showing the predicted amino acid sequence of human SOCS 1 5 
protein, derived from the nucleotide sequence in Figure 48A. The SOCS box, which also shown 
in Figure 13 is underlined. 

25 

Figure 49 is a photographic representation showing SOGS 1 inhibition of JAK2 kinase activity. 
(A) Upper panel Cos M6 cells were transiently transfected with cither Flag-tagged mJAK2 and 
mSOGS-1 DNA (SOCSl) or Hag-nJAK2 DNA alone (-), lysed, JAK2 proteins 
imnuinoprecqjitated using anti-JAK2 antibody and subjected to an in vitro kinase assay. Lower 
30 panel A portion of the JAK2 immunoprccipitates were Western blotted with anti-JAK2 
antibody. (B) Upper panel. Cos M6 cells were transiently transfected with Flag- mJAK2 and 
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Flag- mSOCS-1 DNA or Flag-mJAK2 DNA alone, lysed, JAK2 proteins immunoprecipitated 
using anti-JAK2 (UBI) and separated by SDS/PAGE gel. Immunoprecipitates were then 
analysed by Western blot with anti-phosphotyrosine antibody. Lower panel; JAK2 expression. 
Cos cell lysatcs were separated by SDS/PAGE gel and analysed by Western blot with anti-FLAG 
5 antibody (M2). 

Figure SO is a photographic representation showing interaction between JAK2 and SOCS 
protein. (A) Cos M6 ceDs weie transiently transfected with Flag-tagged mJAK2 and various 
Flag-tagged SOCS DNAs (SOCS-l;Sl, SOCS-2;S2, SOCS-3;S3, CIS) or Flag-mJAK2 alone, 

10 lysed, JAK2 proteins inrnmnoprecipitated using anti-JAK2 (UBI) and separated by SDS/PAGE. 
Immunoprecipitates were then analysed by Western blot with anti-FLAG antibody (M2), (B) 
Cos cell lysates described in (A) were separated by SDS/PAGE and expression levels of the 
various proteins were detemiined by Western blot with anti-FLAG antibody {M2), (C) JAK2 
tyrosine phosphorylation, Cos cell lysates described in (A) were separated by SDS/PAGE and 

15 proteins analysed by Western blot with anti-phosphotyrosine antibody. 

Figinre 51 is a diagrannimatic representation of pPgalpAloxneo. 

Figure 52 is a diagrammatic representation of pPgalpAloxneoTK- 

20 . 

Figure 53 is a diagrammatic representation of SOCSl knockout construct. 



25 



30 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

The present invention provides a new family of modulators of signal transduction. As the initial 
members of this family suppressed cytokine signalling, the family is referred to as the 
5 "suppressors of cytokine signalling'^ family of "SOCS". The SOCS family is defined by the 
presence of a C-tenninal domain referred to as a "SOCS box". Different classes of SOCS 
molecules are defined by a motif generally but not exclusively located N-terminal to the SOCS 
box and which is involved by proteinrmolecule interaction such as protein:DNA or 
protein:protein interaction. Particularly preferred motifs are selected from an SH2 domain, WD- 
10 40 repeats and ankyrin repeats. 

^0-40 repeals were originally recognised in the P-subunit of G-proiexas. WD-40 repeats appear 
to forma p-propelkr-like structure and may be involved in protein-protein interactions. Ankyrin 
repeats were originally recognised in the cytoskeletal protein ankryin. 



Members of the SOCS family may be identified by any number of means. For example, SOCS 1 
to SOCS3 were identified by their ability to suppress cytokine-mediated signal u^nsduction and. 
hence, were idratified based on activity. SOCS4 to SOCS 15 were identified as nucleotide 
sequences exhibiting similarity at the level of the SOCS box. 



The SOCS box is a conserved motif located in the C-tenninal region of the SOCS molecule. In 
accordance with the present invention, the amino acid sequence of the SOCS box is: 



15 



20 



25 




wh^in: 



X, is L, I, V, M, Aor P; 
X2 is any amino acid residue; 
X3 is P, T or S; 
X^is L, I, V, M, AorP; 
X5 is any amino acid; 



30 
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Xj is any amino acid; 
X,isL.I.V.M.A.F.YorW; 

Xj is CTorS; 
X9isR.KorH; 
5 Xio is any amino acid; 

Xn is any amino acid; 
XiiisL,!. V, M.AorP; 

is any amino acid; 
Xi4 is any amino acid; 
Xij is any amino acid; 
X,6 is L. 1. V, M, A, P, G, C, T or S; 

[XJ„ is a sequence of n amino adds wherein n is from 1 to 50 amino acids 
and wherein the sequence X^ may comprise the same or different amino 
acids selected from any amino acid residue; 
15 XnisUI, V,M, AorP; 

X,8 is any amino acid; 
X ,9 is any amino acid; 
XjoL.I.V.M,AorP; 
Xii is P; 

20 XaisL,I.V.M, A.PorG; 

XjjisPorN; 

CJg, is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids sdected from any amino acid residue; 
25 X24isL,I,V,M, AorP; 

X^s is any anuno acid; 
Xj5 is any amino acid; 
Xi7isYorF;and 
X„isL. I.V. M. AorP. 



30 



AS stared above and in accordance with the present invention. SOCS proteins are divided into 
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separate classes based on the presence of a protein:inolecuIe interacting region such as but not 
limited to an SH2 domain* WD-40 repeats and ankyrin repeats located N-terminal of the SOCS 
box. The latter three domains are protein:protein interacting domains, 

5 Exan55ks of SH2 containing SOCS proteins include SOCSl, S0CS2, S0CS3, S0CS5, SOCS9, 
SCX:SH and SOCS 14. Exarapi&s of SOCS containing WD^O repeats include S0CS4, SOCS6 
and SOCS 15, Exanqjles of SOCS containing ankyrin repeats include SOCS7, SOCS 10 and 
SOCS 12. 

10 The present invention provides inter alia nucleic acid molecules encoding SOCS proteins, 
purified naturally occurring SOCS proteins as well as recombinant forms of SOCS proteins and 
methods of modulating signal transduction by modulating activicy of SOCS proteins or 
ejqrrcssion of SOCS genes. Preferably, signal transduction is mediated by a cytokine, examples 
of which include EPO, TPO, G-CSF, GM-CSF, lL-3, 11^2, 1L.4, IL-7, IL-13, IL-6. UF, IL-12, 

15 IFNY,TNFa,lL-l and/or M-CSR Particularly preferred cytokines include IL-6, LIF, OSM, 
IFN*y and/or thrombopoietin. 

Accordingly, one aspect of the present invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or coii^lementary to a sequence encoding a 
20 protein jpr a derivative^ homologue, analogue or mimetic thereof or comprises a nucleotide 
sequence capable of hybridizing thereto under low stringency conditions at 42''C wherein said 
protein comprises a SOCS box in its C-terminal region and optionally a proteinimolecule 
interacting domain N^terminal of the SOCS box, 

25 Preferably, the proteinrmolecule interacting domain is a protein:DNA or proteinrprotein 
interacting domain- Most preferably, the proteinrmolecule interacting domain is one of an SH2 
domain, WD-40 repeats and/or ankyrin repeats. 

As stated above, preferably the subject SOCS modulate cytokine-mediated signal transduction. 
30 The present invention extends, however, to SOCS molecules modulating other effector-mediated 
signal transduction such as mediated by other endogenous or exogenous molecules, antigens. 
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microbes and microbial products, viruses or components thereof, ions, hormones and parasites. 

Endogenous molecules in this context are molecules produced within the cell carrying the SOCS 
molecule. Exogenous molecules are produced by other cells or are introduced to the body, 

5 Preferably, the nucleic acid molecule or SOCS protein is in isolated or purified form. The terms 
^isolated" and '"purified'* mean that a molecule has undergone at least one purification step away 
from other material. 

Preferably, the nucleic acid molecule is in isolated form and is DNA such as cDNA or genomic 
10 DNA. The DNA may encode the same amino acid sequence as the naturally occurring SOCS 
or the SOCS may contain one or more amino acid substitutions, deletions and/or additions. The 
nucleotide sequence may correspond to the genomic coding sequence (including exons and 
introns) or to the nucleotide sequence in cDN A from mRNA transcribed from the genomic gene 
or it may cany one or more nucleotide substitutions, deletions and/or additions thereto. 

15 

In a preferred embodiment, the nucleic acid molecule comprises a sequence of nucleotide 
encoding or conq^lenientary to a sequence encoding a SOCS protein or a derivative, homologue, 
analogue or mimetic thereof wherein the amino acid sequence of said SOCS protein is selected 
from SEQ ID N0:4 (mSOCS I), SEQ ID N0:6 (mS0CS2), SEQ ID NO:8 (mSOCS3), SEQ ID 

20 NO:10_ChSQCSl), SEQ ID NO:12 (rSOCSl), SEQ ID N0:14 (mS0CS4), SEQ ID NO:18 
(mSOCSS), SEQ ID N0:21 (mSOCS6), SEQ ID NO:25 (mSOCS27), SEQ ID NO:29 
(mSOCS8), SEQ ID NO:36 (hSOCSll), SEQ ID NO:41 (tnSOCSlS), SEQ ID NO:44 
(mSOCSl4), SEQ ID NO:46 (mSOCSlS) and SEQ ID NO:48 (mSOCSlS) or encodes an amino 
acid sequence with a single or multiple amino acid substitution, deletion and/or addition to the 

25 listed sequences or is a nucleotide sequence capable of hybridizing to the nucleic acid molecule 
under low stringency conditions at 42^, 

In an even more preferred embodiment, the present invention provides a nucleic acid molecule 
comprising a sequence of nucleoddes encoding or complementary to a sequence encoding a 
30 SOCS protein or a derivative, homologue, analogue or mimetic thereof wherein the nucleotide 
sequence is selected from a nucleotide sequence substantially set forth in SEQ ID NO:3 
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(raSCX:S 1), SEQ ID NO:5 (inSOCS2), SEQ ID NO:7 (mS0CS3), SEQ JD NO:9 (hSCX:Sl I), 
SEQ ID N0:11 (rSOCSl), SEQ ID N0:13 (niS0CS4), SEQ ID N0:15 and SEQ ID N0:16 
(hSOCS4X SEQ ID NO: 17 (mSOCSS), SEQ ID NO: 19 (hSOCSS), SEQ ID NO:20 (mSOCS6), 
SEQ ID NO;22 and SEQ ID NO:23 (hS0CS6), SEQ E> NO:24 (mSOCS?), SEQ ID NO:26 and 
5 SEQ ID NO:27 (hSOCS7), SEQ ID NO:28 (mSOCSS), SEQ ID NO:30 (mSOCSP), SEQ ID 
NO:31 (hSOCS9), SEQ ID NO:32 (mSOCSlO). SEQ ID NO:33 and SEQ ID NO:34 
(hSOCSlO). SEQ ID NO:35 (hSOCSlI), SEQ ID NO:37 (mS0CS12), SEQ ID NO:38 and 
SEQ ID NO:39 (hS0CS12), SEQ ID NO:40 (mSOCS13)* SEQ ID NO:42 (hSOCS13), SEQ 
ID NO:43 (tnSOCS14), SEQ ID NO:45 (niSOCS15) and SEQ ID NO:47 (hSOCSlS) or a 
10 nucleotide sequence having at least about 15% similarity to all or a region of any of the listed 
sequences or a nucleic acid molecule copzblc of hybridizing to any of the listed sequences under 
low stringency conditions at 42''C. 

Reference herein to a low stringency at 42*C includes and encompasses from at least about 1% 
1 5 v/v to at least about 15% v/v fonnamide and from at least about IM to at least about 2M salt for 
hybridisation, and at least about IM to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and enconqsasses iBrom at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
20 to at Jeast about 0.9M salt for washing conditions, or high stringency, which includes and 
encon^asses fiom at least about 31% v/v to at least about 50% v/v formamide and from at least 
about O.OIM to at least about 0.15M salt for hybridisation, and at least about O.OIM to at least 
about 0. 15M salt for washing conditions. 

25 In another embodiment, the present invention is directed to a SOCS protein or a derivative, 
homologue, analogue or mimetic thereof wherein said SOCS protein is identified as follows: 

human S0CS4 characterised by EST81149, EST180909, EST182619, ya99H09, 
ye70co4, yh53c09, yh77gll, yh87h05, yi45hD7, yj04e06, yql2h06, yq56a06, yq60c02, 
30 yq92g03, yq97h06, yr90£01, yt69c03, yv30a08> yv55fD7, yv57h09, yv87h02, yv98ell, 

yw68dl0, yw82a03, yx08a07, yx72h06, yx76b09, yy37h08, yy66b02, za81fD8, zbl8f07. 
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zc06eO8, zdl4g06, zd51hl2, zd52b09, ze25gll, 2e69fD2, zf54fD3, zh96e07, zv66hl2, 
zs83a08and2s83g08; 

mouse SOCS-4 characterised by in:65f04, mf42e06, n^)10clO, nirSlgOP, and mtl9hl2; 

human SOCS-5 characterised by EST15B103, EST15B105. EST27530 and zfSOfOl; 

mouse SOCS-5 characterised by mc55a01, inh98f09, my26hl2 and ve24e06; 

human SOCS-6 characterised by yf61e08, yf93a09, yg05fl2. yg41fD4, yg45c02, 
yhl IflO, yhl3b05, zc35al2, ze02h08, zl09a03, zl69el0, zn39d08 and zo39e06; 

mouse SOCS-6 characterised by mc04c05, md48a03, mf31d03, mh26b07, mh78el 1, 
mh88h09, mh94h07, mi27h04 and mj29c05, mp66g04, mw75g03, va53b05, vb34h02, 
vc55d07. vc59e05. vc67d03, vc68dlO, vc97h01, vc99c08, vd07h03, vdOScOl, vd09bl2, 
vdl9b02. vd29a04 and vd46d06; 

human SOCS-7 characterised by STS WI30171, EST00939, EST12913, yc29b05, 
yp49fl0, ztlOf03 andzx73g04; 

mouse SOCS-7 characterised by mj39a01 and vi52h07; 
mouse SOCS-8 characterised by mj6e09 and vj27a029; 

human SOCS-9 characterised by CSRL-82f2-u, EST! 14054, yy06b07, yy06g06. 
2r40c09, zr72h01, yx92c08. yx93b08 and hfe0662; 

mouse SOCS-9 characterised by me65d05; 

human SOCS-10 characterised by aa48hl0, zp35h01, zp97hl2, zqOShOl, zr34g05, 
EST73000 and HSDHEI0C5; 
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mouse SOCS-10 characterised by mbl4dl2, mb40fD6, mg89bl U mq89el2, mp03gl2 
and vh53cll; 

human SOCS-1 1 characterised by zt24h06 and zr43b02; 

5 

human SOCS-13 characterised by EST59161; 

mouse SOCS-13 characterised by ma39a09, me60c05, mi78g05, mklOcll, mo48gl2, 
mp94a01, vb57c07 and vh07cl 1; and 

10 

human SOCS-I4 characterised by nii75e03, vd29hll and vd53g07; 
or a derivative or homologue of the above ESTs characterised by a nucleic acid molecule 
being capable of hybridizing to any of the listed ESTs under low stringency conditions 
at 42X. 

15 

In another embodiment, the nucleotide sequence encodes the following amino acid sequence: 

X2 X3 X4 X5 X5 X7 Xg X9 Xjo Xii Xi2 Xi3 Xi4 Xi5 X16 [XJtt Xi7 X18 Xi$ X20 
Xji X22 X23 [Xj]jj X24 X25 X26 X27X2a 

20 - 

wherein: Xj is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X^is L. I, V, M, Aor P; 
25 X5 is any amino acid; 

X^ is any amino acid; 

X^is L, I,V, A,F,YorW; 

Xg is T or S; 

X^isR, Kor H; 
30 Xio is any amino acid; 

X|j is any amino acid; 
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X^jis U I, V, M.Aor P; 

Xi3 is any amino acid; 

Xu is any amino acid; 
is any amino acid; 
5 is U l V, M. A, P. G, C T or S; 

[XJo is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xi may comprise the same or different amino 

acids selected from any amino acid residue; 

X„isL,I,V, M, AorP; 
10 Xig is any amino acid; 

Xi9 is any amino acid; 

X20L, I> V. AorP; 

Xj.isP; 

is L, I, V, M. A, P or G; 
15 XjsisPorN; 

is a sequence ofn amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence may comprise the same or different amino 

acids selected from any amino acid residue; 

X24isL,I,V,M, AorP; 
20 - X25 is any amino acid; 

X25 is any amino acid; 
is Y or F; and 

X^ais L, I, V, M, Aor P. 

25 The above sequence comparisons are preferably to the whole molecule but may also be to part 
thereof. Preferably, the comparisons are miade to a contiguous series of at least about 21 
nucleotides or at least about 5 amino acids. More preferably, the comparisons are made against 
at least about 21 contiguous nucleotides or at least 7 contiguous amino acids. Comparisons may 
also only be made to the SOCS box region or a region encompassing the protein;molecule 

30 interacting region such as the SH2 domain WD-40 repeats and/or ankyrin repeats. 
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Still another embodiment of the present invention contemplates an isolated polypeptide or a 
derivative, homologue, analogue or numetic thereof comprising a SOCS box in its C-tenninaJ 
region. 

5 Preferably the polypeptide further comprises a protein:molecuIe interacting doxnain such as a 
protein:DNA or protein:protein interacting domain. Preferably, this domain is located N-tenninal 
of the SOCS box. It is particularly preferred for the protein:moIecule interacting domain to be 
at least one of an SH2 donaain, WD-40 repeats and/or ankyrin repeats. 

10 Preferably, the signal transduction is mediated by a cytokine selected from EPO, TPO, G-CSF, 
GM-CSF, rL-3, IL.2, rL-4, IL*7, 11^13, IL-6. LIF, IL-12, IFNy, TNFa, IL-l and/or M-CSR 
Preferred cytokines are IL-6, LDF, OSM, IFN-y or thrombopoietin. 

More preferably* the protein comprises a SOCS box having the amino acid sequence: 



15 




wherein: 



Xi is L, I, V, M,AorP; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X4is L. I, V, M, AorP; 

X5 is any amino acid; 

Xfi is any amino acid; 

X^is L, I, V, M, A, F, YorW; 

Xg is C, TorS; 

X^is R,KorH; 

Xjo is any amino acid; 

X^ is any amino acid; 

XiiisL, I, V, M. AorP; 

Xi3 is any amiiio acid; 



20 



25 



30 
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Xu is any amino acid; 

is any amino acid; 
X,^ is L, I, V, M. A, P, G. C, T or S; 

PQ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
5 and wherein the sequence Xi may comprise the same or different amino 

acids selected from any amino acid residue; 
XnisL, I. V, M, AorP; 
Xtg is any amino acid; 

is any amino acid; 
10 X2oUI,V,M, AorP; 

Xji is P; 

is L, I, V, M, A, P or G; 
X23 is P or N; 

[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
15 and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

Xj,is L, I, V, M. AorP; 

Xis is any amino acid; 

X26 is any amino acid; 
20 . X27 is Y or F; and 

X28 is U h V, M,AorP. 



Still another enixKJiment provides an isolated polypeptide or a derivative, homologue, analogue 
or mimetic thereof comprising a sequence of amino acids substantially as set forth in SEQ ID 

25 NO:4 (mSOCSl), SEQ ID NO:6 (mSOCS2), SEQ ID NO:8 (mSOCSS), SEQ ID NO:10 
(hSOCSl), SEQ ID NO: 12 (rSOCSl), SEQ ED NO: 14 (mSOCS4), SEQ ID NO: 18 (mSOCSS), 
SEQ ID N0:21 (mSOCS6), SEQ ID NO:25 (mSOCS7), SEQ ID NO:29 (mSOCSS), SEQ ID 
NO:36 (hSOCSll), SEQ ID NO:41 (mS0CS13), SEQ ID NO:44 (mSOCSU), SEQ ID NO:46 
(mSOCSlS) and SEQ ID NO:48 (hSOCSlS) or an amino acid sequence having at least 15% 

30 similarity to all or a part of the listed sequences. 
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Preferred nucleotide percentage similarities include at least about 20%, at least about 40%, at 
least about 50%, at least about 60%, 3t least about 70%, at least about 80%, at least about 90% 
or above such as 93%, 95%, 98% or 99%. 

5 Preferred amino acid similarities include at least about 20%, at least about 30%, at least about 
40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least 
about 90%, at least about 95%, at least about 97% or 98% or above. 

As stated above, similarity may be measured against an entire molecule or a region comprising 
10 at least 21 nucleotides or at least 7 amino acids. Preferably, similarity is measured in a conserved 
region such as SH2 domain, WD-40 repeats, ankyrin repeats or other protein:molecule 
interacting domains or a SOCS box. 

The term "similarity" includes exact identity between sequences or, where the sequence differs, 
15 different amino acids are related to each other at the structural, functional, biochemical and/or 
conformational levels. 

The nuckic acid molecule may be isolated from any animal such as humans, primates, livestock 
animals (e.g. horses, cows, sheep, donkeys, pigs), laboratory test animals (e.g. mice, rats, rabbits, 
20 hamstjgs^ guinea pigs), companion animals (e.g. dogs, cats) or captive wild animals (e.g. deer, 
foxes, kangaroos). 

The terms "derivatives" or its singular form **derivative" whether in relation to a nucleic acid 
molecule or a protein includes parts, mutants, fragments and analogues as weU as hybrid or 
25 fusion molecules and glycosylation variants. Paiticularly useful derivatives comprise single or 
multiple anmo acid substitutions, deletions and/or additions to the SOCS amino acid sequence. 

Preferably, the derivatives have fimctional activity or alternatively act as antagonists or agonists. 
The present invention further extends to homologues of SOCS which include the functionally or 
30 structurally related molecule from different animal species. The present invention also 
encompasses analogues and mimetics. Mimetics include a class of molecule generally but not 
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neccssarily having a non-ainino acid structure and wliich functionally are capable of acting in an 
analogous manner to the protein for which it is a mimic, in this case, a SOCS. Mimetics may 
comprise a carbohydrate, aromatic ring, lipid or other complex chemical structure or may also 
be proteinaceous in composition, Mimetics as well as agonists and antagonists contemplated 
5 herein are conveniently located through systematic searching of environments, such as coral, 
marine and freshwater river beds, flora and microorganisms. This is sometimes referred to as 
natural product screening. Alternatively, libraries of synthetic chemical compounds may be 
screened for potentially useful molecules. 

10 As stated above, the present invention contemplates agonists and antagonists of the SOCS. One 
example of an antagonist is an antisense oligonucleotide sequence. Useful oligonucleotides are 
those which have a nucleotide sequence complementary to at least a portion of the protein- 
coding or "sense" sequence of the nucleotide sequence. These anti-sense nucleotides can be 
used to effect the specific inhibition of gene expression. The antisense approach can cause 

15 inhibition of gene expression apparently by forming an anti-parallel duplex by complementary 
base pairing between the antisense construct and the targeted mRNA, presumably resulting in 
hybridisation arrest of translation. Ribozymes and co-suppression molecules may also be used. 
Antisense and other nucleic acid molecules may first need to be chemically modified to permit 
penetration of cell membranes and/or to increase their serum half life or otherwise make them 
20 more stable for in vivo administration. Antibodies may also act as either antagonists or agonists 
although are more useful in diagnostic applications or in the purification of SOCS proteins. 
Antagonists and agonists may also be identified following natixral product screening or 
screening of libraries of chemical compounds or may be derivatives or analogues of the SOCS 
molecules. 

25 

Accordingly, the present invention extends to analogues of the SOCS proteins of the present 
invention. Analogues may be used, for example, in the treatment or prophylaxis of cytokine 
mediated dysfunction such as autoimmunity, immune suppression or hyperactive immunity or 
other condition including but not limited to dysfunctions in the haemopoietic, endocrine, hepatic 
30 and neural systems. Dysfunctions mediated by other signal transducing elements such as 
hormones or endogenous or exogenous molecules, antigens, microbes and microbial products. 
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viruses or components thereof, ions, honnones and parasites are also contemplated by the 
present invention. 

Analogues of the proteins contemplated herein include, but are not limited to, modification to 
5 side chains, incorporating of unnatural amino acids and/or their derivatives during peptide, 
polypeptide or protein synthesis and the u^e of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

Examples of side chain modifications contemplated by the present invention include 
10 modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBH4; amidination with methylacetimidate; acylation with acetic 
anhydride; carbamoylation of amino groups with cyanate; triniirobenzylation of amino groups 
with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of anaino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
15 phosphate followed by reduction with NaBH4. 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
condensation products with reagents such as 2,3-buianedione, phenylglyoxal and glyoxal. 

20 The qarhoxyl group may be modified by carbodiimide activation via O-acylisourea fonnation 
followed by subsequent derivitisation, for example, to a corresponding amide. 

Sulphydryl groups n^iay be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; perfonnic acid oxidation to cysteic acid; fonnation of a mixed disulphidcs 
25 with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleimide; formation of mercurial derivatives using 4-chloromcrcuribenjzoate, 4- 
cMoromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-mtrophenol and 
other mercurials; carbamoylation with cyanate at alkaline pH. 

30 Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl broniide or sulphenyl halides. 
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Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
a 3-nitrotyrosine derivative. 

Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
5 iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate, 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3«hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline^ phenylglycine, ornithine, 
10 sarcosine, 4-araino-3-hydroxy-6-methylheptaxioic acid, 2-thienyl alanine and/or D-isomers of 
amino acids, A list of unnatural amino acid, contemplated herein is shown in Table 3. 
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TABLE 3 



Non-conventional 
amino acid 

5 


Code 


Non-conventional 
amino acid 


Code 


a-aminobutyric add 


Abu 


L-N-methylalanine 


Nmala 


oE-anaino-a-methylbutyrate 


Mgabu 


L-N-methylarginine 


Nmarg 


aminocyclopropane- 


Cpro 


L-N-methylasparagine 


Nmasn 


10 carboxylate 




L-N-methylaspartic acid 


Nniasp 


aminoisobutyric acid 


Aib 


L-N-methylcysteine 


Nmcys 


aminonorbomyl- 


Norb 


L-N-methylglutamine 


Nmgln 


carboxylate 




L-N-methylglutamic acid 


Nmglu 


cyclohexylalanine 




Chexa L-N-methylhistidine 


Nmhis 


15 cyclopentylalanine 


Cpen 


L-N-methylisoIleucine 


Nmile 


D-alanine 


Dal 


L-N-methyUeucine 


Nmleu 


D-arginine 


Darg 


L-N-methyllysine 


Nmlys 


D-aspartic acid 


Dasp 


L-N-methylmethionine 


Nmmet 


D-cysteine 


Dcys 


L-N-methylnorleucine 


Nmnle 


20 D-glutamine. 


Dgln 


L-N-methylnorvaline 


Nmnva 


D-glutamic acid 


Dglu 


L-N-methylomithine 


Nmom 


D-histidine 


Dhis 


L-N-methylphenylalanine 


Nmphe 


D-isoleudne 


Dile 


L-N-niethylproline 


Nmpro 


D-leucine 


Dleu 


L-N-methylserine 


Nmser 


25 D-lysine 


Dlys 


L-N-methylthreonine 


Nmthr 


D-methionine 


Dmei 


L-N-methyltryptophan 


Nmtrp 


D-omithine 


Dom 


L-N-methyltyrosine 


Nmtyr 


D-phenylalanine 


Dphe 


L-N-mcthylvaline 


Nmval 


D-proline 


Dpro 


L-N-methylethylglycine 


Nmetg 


30 D-serine 


Dser 


L-N-methyl-t-butylglycine 


Nmtbug 


D-threonine 


Dthr 


L-norleucine 


Nle 
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D-txyptophan 


Dtrp 


L-norvaline 


Nva 


D-tyrosine 


Dtyr 


a-methyl-aminoisobutyrate 


Maib 


D- valine 


Dval 


a-methyl-Y-aminobutyrate 


Mgabu 


D-a-methylalanine 


Dmala 


a-methylcyclohexylalanine 


Mchexa 


5 D-a-methylarginine 


Dmarg 


a-methylcylcopentylalanme 


Mcpen 


D-ct-methylaspaiagine 


Dmasn 


a-methyl-a-napthylalanine 


Manap 


D-a-methylaspartate 


Dmasp 


a-methylpeniciOaniine 


Mpen 


D-a-methylcysteine 


Dmcys 


N-(4-aminobutyl)gIycine 


Nglu 


D- a-methylglutamine 


Dmgln 


N-(2-aminoethyl)glycine 


Naeg 


10 I>-a-methylhistidine 


Dmhis 


N-(3-aimnopropyl)glycine 


Norn 


D-a-methyEsoleucine 


Dmile 


N-amko-a-methylbutyrate 


Nmaabu 


D-a-methylleucine 


Dmleu 


a-napthylalanine 


Anap 


D-a-methyllysine 


Etolys 


N-benzylglycine 


Nphe 


D-a-methylmethionine 


Dmmet 


N-(2-<ari>aniylethyl)glycine 


Ngln 


15 D-a-methylomithine 


Dmom 


N-(carbamylmethyl)glycine 


Nasn 


D-a-methylphenylalanine 


Dffiphe 


N-(2-carboxyethyl)glycine 


Nglu 


D-a-methylproline 


Dmpro 


N-(caiboxyinethyl)glycdne 


Nasp 


D-a-methylserine 


Dmser 


N-cyclobutylglycine 


Ncbut 


D-a-methylthreonine 


Dmihr 


N-cycloheptylglycine 


Nchep 


20 D-a-mefliyltryptophan 


Dmtrp 


N-cyclohexylglycine 


Nchex 


D-a-methyltyrosine 


Dmty 


N-cyclodecylglycine 


Ncdec 


D-a-methylvaline 


Dmval 


N-cylcododecylglycine 


Ncdod 


D-N-methylalanine 


Dnmala 


N-cyclooctylglycine 


Ncoct 


D-N-methylarginine 


Dnmarg 


N-cycIopropylglycine 


Ncpro 


25 D-N-methylasparagine 


Dmnasn 


N-cycloundecylglycine 


Ncund 


D-N-methylaspartate 


Dnmasp 


N-(2;2-diphfinylethyl)glycine 


Nbhm 


D-N-methylcysteine 


Dnincys 


N-(33-diphenylpropyl)glycme 


Nbhe 


D-N-methylglutamine 


Dnmgln 


N-(3-guanidinopropyl)glycine 


Narg 


D-N-methylglutamate 


Dnmglu 


N-(l-hydroxyethyl)glycine 


Nthr 


30 D-N-methylhistidine 


Drunhis 


N-(hydrQxyethyl))glycine 


Nser 


D-N-methylisoleucine 


Dnmile 


N-(iinidazolylethyl))glycine 


Nhis 
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D-N-methylleucine 
D-N-methyllysine 
N-meihylcydohexylalanine 
I>-N-methylomithine 
5 N-methylglycine 

N-methylamixioisobutyrate 
N-(l -methylpropyOglycine 
N-(2-iBethylpropyl)glycine 
D-N-methyltryptophan 

10 D-N-methyltyrosme 
D-N-methylvaline 
y-aminobutyric acid 
I^f-butyiglycine 
L-ethylglycine 

15 L-homophenylalanine 
L-a-methylargutiine 
L-a-methylaspanaie 
L-a-methylcysteine 
L-a-methylglutamine 

20 L-a-metiiiylhistidme 
L-a-methylisoleucine 
L-a-Tnethylleucine 
L-a-methylmethionine 
L-a-methylnorvaline 

25 L-a-meihylphenylalanine 
L-a-methylserine 
I^a-methyltryptophan 
L-a-methylvaline 
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Dnmleu 


N-(3-indolylyethyl)glycinc 


Nhtrp 


DnmlvQ 


N- methvl- V -aininobutvrate 


A ^AAA^4AVU 


Nfmchexa 


D-N-methvImethionine 


Dnxnmet 


Onmom 


N-meth vlcvcloDentvlalani ne 


Nmcpen 


Nala 


D^N-rneflivlDhenvlalBniiift 


DnniDh& 




D-N-methvlnnftline 


r)nrnoro 

X>^AJXA,LL/A Vr 






X^XIlXiJ^X 






On mthr 


X^AkkkiMLjJ 


M-^l -mftttivlpthvfWlvn'ne 


Nval 




N-metli vla-nanthvlalar i ne 


A ^ Al mi iOfc^ 


Dnmval 


N-methvlDeniciUamine 


Nmoen 


Gabu 


N-fD-hvdroxvDhenvDfflvcine 


Nhtyr 


Tbuff 


N-(thioniethyl)glycine 


Ncys 


o 


peiiicillanune 


Pen 


Hphe 


Lr {3C-ineth vlalanine 

■J-^ *• AaAS/»AA T A<A MA^ « A A A V 


Mala 




u« xxxw uxjr Xmo l/cu. gpyij.iw 


XTfMHiOXX 










i^~n icLLiy xcuiy x^*y t^xiie 


Mctg 




X.^ i UIX Y 1 gX U MUlld 






\M k 1 IwlXXJ^ xxiXJk I J.\,^ L/lICXXy XAAclX ULaX^ 




Mile 


NT-^ '2.-metH v1 tHi netHvl ^ ^ 1 vci n ft 

A^ V'^ IXXV*VXXjf XUXXVi^i'UXjf X^^Ajf WXAXWif 


Nmct 


Mlcu 


iy- u-ineuiyuy sine 


Mlys 


Mmet 


L-a-methylnorleucine 


Mnle 


Mnva 


. L-a-methylomithine 


Mom 


Mphe 


L-a-methylproline 


Mpro 


Mser 


L-a-methylthreonine 


Mthr 


Mtrp 


L-cc-tncthyltyrosinc 


Mtyr 


Mval 


L-N-methylhomophenylalanine 


Nmhphe 
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N-(N-(2,2-diphenylethy]) 
carbamylmethyDglycine 



Nnbhm 



N-(N-(3,3-diphenylpropyl) 
carbamylinethyl)glycine 



Nnbhe 



1-carboxy- l-(2,2-diphenyl- Nmbc 
ethylajrrdno)cyclopropane 
5 

Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crossJinkers such as the bifiinctional imido esters having (CH2)n spacer groups with n=l lo n=6. 
glutaraldehyde. N-hydroxysuccinimide esters and hetero-bifiinctional reagents which usually 

10 contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dithio naoiety (SH) or carbodiiniide (COOH). In addition, 
peptides can be confonnationally constrained by, for example, incorporation of and I^- 
methylamino acids, introduction of double bonds between C„ and Cp atoms of amino acids and 
the formation of cyclic peptides or analogues by introducing covalent bonds such as forming 

15 an amide bond between the N and C teimini, between two side chains or between a side chain 
and the N or C terminus. 

These types of modifications may be important to stabilise the cytokines if administered to an 
individual or for use as a diagnostic reagent. 



Other derivatives contemplated by the present invention include a range of glycosylation variants 
from a completely unglycosylated moleciile to a modified glycosylated molecule. Altered 
glycosyktion patterns may result from expression of recombinant molecules in different host cells. 

25 Another embodiment of the present invention contemplates a method for modulating expression 
of a SOCS protein in a mammal, said method comprising contacting a gene encoding a SOCS or 
a fectox/element involved in controlling expression of the SOCS gene with an effective amount of 
a modulator of SOCS expression for a time and under conditions sufficient to up-regulate or 
down-regulate or otherwise modulate expression of SOCS. An example of a modulator is a 

30 cytokine such as IL-6 or other transcription regulators of SOCS expression. 



20 
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Expression includes transcription or translation or both. 

Another aspect of the present invention contemplates a method of modulating activity of SOCS 
in a human^ said method conpiising administering to said manunal a modulating effective amount 
5 of a molecule for a time and under conditions sufficient to increase or decrease SOCS activity. 
The molecule may be a proteinaceous molecule or a chemical entity and may also be a derivative 
of SOCS or a chemical analogue or truncation mutant of SOCS. 

A further aspect of the present invention provides a method of inducing synthesis of a SOCS or 
10 transcription/translation of a SOCS comprising contacting a cell containing a SOCS gene with an 
effective antiount of a cytokine capable of inducing said SOCS for a time and under conditions 
sufficient for said SOCS to be produced For example, SOCSl may be induced by IL-6. 

Still a further aspect of the present invention contemplates a method of modulating levels of a 
15 SOCS protein in a cell said method comprising contacting a cell containing a SOCS gene with an 
efifective amount of a modulator of SOCS gene expression or SOCS protein activity for a time and 
under conditions sufficient to modulate levels of said SOCS protein. 

Yet a further aspect of the present invention contemplates a method of modulating signal 
20 transduction in a cell containing a SOCS gene comprising contacting said cell with an effective 
amount of a modulator of SOCS gene expression or SOCS protein activity for a time sufficient to 
modulate signal transduction. 

Even yet a further aspect of the present mvention contemplates a method of influencing interaction 
25 between cells wherein at least one cell carries a SOCS gene, said method comprising contacting 
the cell carrying the SOCS gene with an efBsctive amount of a nK)dulator of SOCS gene expression 
or SOCS protein activity for a time sufficient to modulate signal transduction. 

As stated above, of the present invention contemplates a range of mimetics or small molecules 
30 cjq^ahle of acting as agonists or antagonists of the SOCS^ Such molecules may be obtained from 
natural product screening such as from coral, soil, plants or the ocean or antarctic environments. 



Alternatively, peptide, polypeptide or protein libraries or chemical libraries imy be readily 
screened. For exan^k, MI cells ejqpressing a SOCS do not undergo differentiation in the presence 
of IL-6. This system can be used to screen noolecules which pennit differentiation in the presence 
of 11^6 and a SOCS, A range of test cells may be prepared to screen for antagonists and agonists 
5 for a range of cytokines. Such molecules are preferably small molecules and may be of amino acid 
origin or of chemical origin. SOCS molecules interacting with signalling proteins (eg, JAKS) 
provide nwlecular screens to detect molecules which interfere or promote this interaction. Once 
such screening protocol involves natural product screening. 

10 According^, the present invention contenplates a pharmaceutical composition comprising SOCS 
or a derivative thereof or a modulator of SOCS expression or SOCS activity and one or more 
pharmaceuiically acceptable carriers and/or diluents. These components are referred to as the 
''active ingredients". These and other aspects of the present invention apply to any SOCS 
molecules such as but not limited to SOCS 1 to SOCS 15. 

15 

The pharmaceutical forms containing active ingredients suitable for injectable use include sterile 
aqueous solutions (where water soluble) sterile powders for the extemporaneous preparation of 
sterile injectable solutions. It must be stable under the conditions of manufacture and storage and 
must be preserved against the contaminating^action of microorganisms such as bacteria and fungi. 

20 The easier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol 
(for example, glycerol, propylene glycol and liquid polye'diylene glycol, and the like), suitable 
mixtures thereof, and vegetable oils. The proper fluidity can be maintained, for example, by the 
use of a coating such as licithin, by the maintenance of the required particle size in the case of 
dispersion and by the use of superfactants. The preventions of the action of microorganisms can 

25 be brought about by various antibacterial and antifungal agents, for example, panibeiis, 
chlorobutanol, phenol, sorbic acid, thirmerosal and the like. In many cases, it will be preferable 
to include isotonic agents, for example, sugars or sodium chloride. Prolonged absorption of the 
injectable compositions can be brought about by the use in the compositions of agents delaying 
absorption, for example, aluminum monostearate and gelatin, 

30 

Sterile injectable solutions are prepared by incorporating the active compounds in the required 
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amouni in the appropriate solvent with various of the other ingredients enumerated ^ove, as 

required, followed by filtered sterilizatioiL In the case of sterile powders for the preparation of 
sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 
freeze-drying technique which yield a powder of the active ingredient plus any additional desired 
5 ingredient from previously sterile-filtered solution thereof. 

When the active ingredients are suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft shell 
gelatin capsule, or it may be compressed into tablets. For oral therapeutic admiiustration, the 

10 active compound may be incorporated with excipients and used in the form of ingestible tablets, 
buccal tablets, troches, capsules, elixirs, suspensions, syrups, wafers and the like. Such 
compositions and preparations should contain at least 1% by weight of active compound. The 
percentage of the compositions and preparations may, of course, be varied and may conveniently 
be between about 5 to about 80% of the weight of the unit. The amount of active compound in 

15 such therapeutically useful con5x>S2tions in such that a suitable dosage will be obtained. Preferred 
compositions or preparations according to the present invention are prepared so that an oral 
dosage unit form contains between about 0. 1 /xg and 2000 mg of active compound. 

The tablets, troches, pills, capsules and the like may also contain the components as listed 
20 hereaftgju A binder such as gum, acacia, com starch or gelatin; excipients such as dicalcium 
phosphate; a disintegrating agent such as com starch, potato starch, alginic acid and the like; a 
lubricant such as magnesium stearate; and a sweetening agent such a sucrose, lactose or saccharin 
may be added or a flavouring agent such as peppermint, oil of wintergreen or cherry flavouring. 
When the dosage unit form is a capsule, it may contain, ia addition to materials of the above type, 
25 a liquid carrier. Various other materials may be present as coatings or to otherwise modify the 
physical form of the dosage unit. For instance, tablets, pills, or capsules may be coated with 
shellac, sugar or both. A syrup or elixir may contain the active coirpound, sucrose as a sweetening 
agent, methyl and propylparabens as preservatives, a dye and flavouring such as cherry or orange 
flavour. Of course, any material used in preparing any dosage unit form should be 
30 pharmaceutically pure and substantially non-toxic in the amounts employed. In addition, the active 
compound($) may be incorporated into sustained-release preparations and formulations. 
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The present invention also extends to forms suitable for topical application such as creams, lotions 
and gels. 

Phannaceutkally acceptable carriers and/or diluents include any and aU solvents, dispersion media, 
5 coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and the like. 
The use of such media and agents for pharmaceutical active substances is well known in the art. 
Except insofar as any conventional media or agent is incompatible with the active ingredient, use 
thereof in the therapeutic compositions is conten:5)lated Supplenientary active ingredients can also 
be incorporated into the compositions. 

10 

It is especially advantageous to formulate parenteral compositions in dosage unit form for ease of 
administration and uniformity of dosage. Dosage uiut form as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 

15 therapeutic efSsct in association with the required pharmaceutical carrier. The specification for the 
novel dosage unit forms of the invention are dictated by and directly dependent on (a) the unique 
characteristics of the active material and the particiilar therapeutic effect to be achieved, and (b) 
the limitations inherent in the art of compounding such an active material for the treatment of 
disease in living subjects having a diseased condition in which bodily health is impaired as herein 

20 disclosed in detail. 

The principal active ingredient is compounded for convenient and effective administration in 
effective amounts with a suitable pharmaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage fonn can, for example, contain the principal active 

25 compound in amounts ranging from 0,5 \xg to about 2000 mg. Expressed in proportions, the 
active compound is generally present in from about 0.5 |ig to about 2000 mg/mJ of carrier. In the 
case of coirpositions containing supplementary active ingredients, the dosages are determined by 
reference to the usual dose and marmer of administration of the said ingredients. The effective 
amount may also be conveniently expressed in terms of an amount per kg of body weight. For 

30 example, from about 0.01 ng to about 10,000 mg/kg body weight may be administei^. 
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The phannaceutical coirposition imy also comprise genetic molecules such as a vector capable of 
transfecting target cells where the vector carries a nucleic acid molecule capable of modulating 
SOCS expression or SOCS activity. The vector may, for example, be a viral vector. In this 
regard^ a range of gene therapies ajne contemplated by the present invention including isolating 
5 certain cells, genetically manipulating and returning the cell to the same subject or to a genetically 
related or sin:iilar subject. 

Still another aspect of the present invention is directed to antibodies to SOCS and its derivatives. 
Such antibodies may be monoclonal or polyclonal and may be selected from naturally occuiring 
antibodies to SOCS or may be specifically raised to SOCS or derivatives thereof. In the case of 
the latter, SOCS or its derivatives may first need to be associated with a carrier molecule. The 
antibodies and/or recombinant SOCS or its derivatives of the present invention are particularly 
useful as therapeutic or diagnostic agents. 

Fbr example, SOCS and its derivatives can be used to screen for naturally occulting antibodies to 
SOCS, These may occur, for example in some autoimmune diseases. Alternatively, specific 
antibodies can be used to screen for SOCS. Techniques for such assays are well known in the art 
and include, for example, sandwich assays and ELISA. Knowledge of SOCS levels may be 
important for diagnosis of certain cancers or a predisposition to cancers or monitoring cytokine 
mediated cellular responsiveness or for monitoring certain therapeutic protocols. 

Antibodies to SOCS of the present invention may be monoclonal or polyclonal. Alternatively, 
fragments of antibodies may be used such as Fab fragments. Furthermore, the present invention 
extends to recombinant and synthetic antibodies and to antibody hybrids, A "synthetic antibody" 
25 is considered herein to include fragments and hybrids of antibodies. The antibodies of this aspect 
of the present invention are particularly useful for immunotherapy and may also be used as a 
diagnostic tool for assessing apoptosis or monitoring the program of a therapeutic regimin. 

For example, specific antibodies can be used to screen for SOCS proteins. The latter would be 
30 important, for example, as a means for screening for levels of SOCS in a cell extract or other 
biological fluid or purifying SOCS made by recombinant means from culture supernatant fluid. 
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Techiiiques for the assays contemplated herein arc known in the art and include, for exanaple, 
sandwich assays and ELIS A, 

It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
5 or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
andbody may be used with a commeicially available anti-immunoglobulin antibody. An antibody 
as contemplated herein includes any antibody specific to any region of SOCS. 

10 Both polyclonal and monoclonal antibodies are obtainable by immunization with the enzyme or 
protein and eitha: type is utilizabie for immunoassays. The methods of obtaining both types of sera 
are well known in the art. Polyclonal sera are less preferred but are relatively easily prepared by 
injection of a suitable laboratory animal with an effective amount of SOCS, or antigenic parts 
thereof, collecting serum from the animal, and isolating specific sera by any of the known 

15 iramunoadsorbent techniques. Although antibodies produced by this method are utilizabie in 
virtually any type of immunoassay, they are generally less favoured because of the potential 
heterogeneity of the product. 

The use of monoclonal antibodies in an immunoassay is particularly preferred because of the ability 
20 to produce them in large quantities and tiie homogeneity of the product. The preparation of 
hybridoma cell lines for monoclonal antibody production derived by fusing an immortal cell line 
and lymphocytes sensitized against the immunogenic preparation can be done by techniques which 
are well known to those who are skilled in the art. 

25 Anotiser aspect of the present invention conten^lates a method for detecting SOCS in a biological 
saiqple from a subject said method comprising contacting said biological sample with an antibody 
specific for SOCS or its derivatives or homologues for a time and under conditions sufficient for 
an antibody-SOCS complex to form and then detectijig said complex. 

30 The presence of SOCS may be accorqplished in a number of ways such as by Western blotting and 
ELISA procedures. A wide range of immunoassay techniques are available as can be seen by 
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reference to US Patent Nos. 4,016,043. 4, 424,279 and 4,018,653. These, of course, include both 
single-site and two-site or "sandwich" assays of the non-competitive types, as well as in the 
traditional competitive binding assays. These assays also include direct binding of a labelled 
antibody to a target. 

5 

Sandwich assays are among the most useful and commonly used assays and are favoured for use 
in the present inv^tioa A number of variations of the sandwich assay technique exist, and all are 
intended to be encompassed by the present invention. Briefly, in a typical forward assay, an 
unlabelled antibody is immobilized on a solid substrate and the sample to be tested brought into 

10 contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufjBcient to allow formation of an antibody-antigen complex, a second antibody specific to the 
antigen, labelled with a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowir^ time sufficient for the formation of another complex of antibody-aniigen- 
labelled antibody. Any unreacted material is washed away» and the presence of the antigen is 

15 detennined by observation of a signal produced by the reporter molecule. The results may either 
be qualit^ve, by simpit observation of the ^isible signal, or may be quantitated by comparing with 
a control sample containing known amounts of hapten. Variations on the forward assay include 
a simultaneous assay, in which both sample and labelled antibody are added simultaneously to the 
bound antibody. These techniques are well known to those skilled in the art, including any nainor 

20 variat^nsas win be readily apparent In accordance with the present invention the sample is one 
which might contain SOCS including cell extract, tissue biopsy or possibly serum, saliva, mucosal 
secretions, lyii5)h, tissue fluid and respiratory fluid. The san5>le is, therefore, generally a biological 
san5>le con5)rising biotogical fluid but also extends to fennentation fluid and supernatant fluid such 
as from a cell culture. 

25 

In the typical forward sandwich assay, a first antibody having specificity for the SOCS or antigenic 
parts thereof, is either covalently or passively bound to a solid surface. The solid surface Is 
typically glass or a polymer, the most commonly used polymers being cellulose, polyacrylamide, 
nylon, polystyrene,, polyvinyl chloride or polypropylene. The solid supports may be in the form 
30 of tubes, beads, discs of nicroplates, or any other surface sxiiiable for conducting an immunoassay. 
The binding processes are well-known in the art and generally consist of cross-linking covalently 
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biading or physically adsorbing, the polymer-antibody complex is washed in preparation for the 
test sample. An aliquot of the sample to be tested is then added to the solid phase complex and 
incubated for a period of time sufficient (e.g. 2-40 minutes or overnight if more convenient) and 
under suitable conditions (e.g. room temperature to 37'C) to allow binding of any subunit present 
5 in the antibody. Following the incubation period, the antibody subunit solid phase is washed and 
dried and incubated with a second antibody specific for a portion of the hapten. The second 
antibody is linked to a reporter molecule which is used to indicate the binding of the second 
antibody to the hapten. 

10 An alternative method involves immobilizing the target molecules in the biological sample and then 
exposing the immobihzed target to specific antibody which may or may not be labelled with a 
reporter molecule. Depending on the amount of target and the strength of the reporter molecule 
signal, a bound target may be detectable by direct labelling with the antibody. Altera^vely, a 
second labelled antibody, specific to the first antibody is exposed to the target-first antibody 

15 con5)lex to form a target-first antibody-second antibody tertiary coripkx^ The complex is detected 
by the signal emitted by rhe reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
chemical nature, provides an analytically identifiable signal which allows the detection of antigen- 
20 boundaiitibody. Detection may be either qualitative or quantitative. The most commonly used 
reporter molecules in this type of assay are either enzymes, fluorophores or radionuclide containing 
molecules (i.e, radioisotopes) and chemiluminescent molecules. 

In the case of an enzyme immunoassay, an enzyme is conjugated to the second antibody, generally 
25 by means of glutaraldehyde or periodate* As will be readily recognized, however, a wide variety 
of different conjugation techniques exist, which are readily available to the skilled artisan. 
Commonly used enzymes include horseradish peroxidase, glucose oxidase, beta-galactosidase and 
alkaline phosphatase, amongst others. The substrates to be used with the specific enzymes are 
generally chosen fiDtr the production, upon hydrolysis by the corresponding enzyme, of a detectable 
30 colour change. Examples of suitable enzymes include alkaline phosphatase and peroxidase. It is 
also possible to employ fluorogenic substrates, which yield a fluorescent product rather than the 
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chromogenic substrates noted above. In ail cases, the eozyme-labelled antibody is added to the 
first antibody hapten complex, allowed to bind, and then the excess reagent is washed away. A 
solution containing the appropriate substrate is then added to the complex of antibody-antigen- 
anubody. The substrate will react with the cnzynie linked to the second antibody, giving a 
5 qualitative visual signal, which may be further quantitated, usually spcctrophotometrically, to give 
an indication of the amount of hapten which was present in the sample. "Reporter molecule" also 
extends to use of cell agglutination or inhibition of agglutination such as red blood cells on latex 
beads, and the like. 

1 0 Alternately, fluorescent confounds, such as fluorescein and rhodamine, may be chemically coupled 
to antflxKiies without altering their binding capacity. When activated by illumination with light of 
a particular wavelength, the fluorochrome-labelled antibody adsorbs the light energy, inducing a 
state to excitability in the molecule, followed by emission of the light at a characteristic colour 
visually detectable with a light microscope. As in the EIA, the fluorescent labelled antibody is 

15 allowed to bind to the Grsi antibody-hapten complex. After washing off the unbound reagent, the 
remaining tertiary complex is then exposed to the light of the appropriate wavelength the 
fluorescence observed indicates the presence of the hapten of interest, Iraraunofluorescene and 
EIA techniques are both very well established in the art and are particularly preferred for the 
present method. However, other reporter molecules, such as radioisotope, chemiluminescent or 
20 bioluminescent molecules, may also be employed. 

The present invention also contemplates genetic assays such as involving PGR analysis to detect 
SOCS gene or its derivatives. Alternative methods or methods used in conjunction include direct 
nucleotide sequencing or mutation scanning such as single stranded confonnation polymorphisms 
25 analysis (SSCP) as speciJSc oligonucleotide hybridisation, as methods such as direct protein 
truncation tests. 

Since cytokines are involved in transcription of some SOCS molecules, the detection of SOCS 
provides surrogate markers for cytokines or cytokine activity. This may be useful in assessing 
30 subjects with a range of conditions such as those will autoimmune diseases, for example, 
rheumatcad arthritis, diabetes and stiff man syndrome amongst others. 
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The nucleic acid molecules of the present invention may be DNA or RNA, When the nucleic acid 
molecule is in DNA form, it may be genomic DNA or cDNA. RNA forms of the nucleic arid 
molecules of the present invention are generally mRNA. 

5 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic molecules 
such as vector molecules and in particular expression vector molecules. Vectors and expression 
vectors are generally capable of replication and, if applicable, expression in one or both of a 
10 prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli. Bacillus sp and 
Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal^ mammalian and insect cells. 

Accordingly, anothra' aspect of the present invention contemplates a genetic construct comprising 
a vector portion and a mammalian and more particularly a human SOCS gene portion, which 
1 5 SOCS gene portion is capable of encoding a SOCS polypq)tidfi or a functional or immunologically 
interactive derivative thereof. 

PrclGarably, the SOCS gene portion of tte genetic construct is operably linked to a promoter on the 
vector such that said promoter is capable of directing expression of said SOCS gene portion in an 
20 appropriate cell. 

In addition, the SOCS gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

25 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

The present invention also extends to any or all derivatives of SOCS including mutants, part, 
30 fragments, portions, honx>logues and anatogues or their encoding genetic sequence including single 
or multiple nucleotide or amino acid substitutions, additions and/or deletions to the naturally 
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occurring nuckotide or aiuino acid sequence. The present invention also extends to mimetics and 
agonists and antagonists of SOCS. 

The SOCS and its genetic sequence of the present invention will be useful in tihe generation of a 
5 range of therapeutic and diagnostic rezgtnts and will be especially useful in the detection of a 
cytokine involved in a particular cellular response or a receptor for that cytokine. For example, 
cells expressing SOCS gene such as Ml cells expressing the SOCSl gene, will no longer be 
responsive to a particular cytokine such as, in the case of SOCSU IL-6. Clearly, the present 
invention further contemplates cells such as Ml cells expressing any SOCS gene such as from 
10 SOCSl to SOCS15. Furthermore, the present invention provides the use of molecules that 
regulate or potentiate the ability of therapeutic cytokines. For example, molecules which block 
some SOCS activity, may act to potential therapeutic cytokine activity (eg. G-CSF). 

Soluble SOCS polypeptides are also contemplated to be particularly useful in the treatment of 
15 disease, injury or abnonnality involving cytokine mediated cellular responsiveness such as 
hyperimmunity, immunosuppression, allergies, hypertension and the like. 

A further aspect of the present invention contemplates the use of SOCS or its functional 
derivatives in the manufecture of a medicament for the treatment of conditions involving cytokine 
20 mediated cellular responsiveness. 

The present invention further contemplates transgenic mammalian cells expressing a SOCS gene. 
Such ceils are useful indicator cell lines for assaying for suppression of cytokine function. One 
example is Ml cells expressing a SOCS gene. Such cell lines may be useful for screening for 
25 cytokines or screening molecules such as naturally occurring molecules from plants, coral, 
microorganisms or bio-organically active soil or water capable of acting as cytokine antagonists 
or agonists. 

The present invention further contemplates hybrids between different SOCS from the same or 
30 different animal species. For example, a hybrid may be formed between all or a functiotial part of 
mouse SOCS I and hunoan SOCS 1 . Alternatively, the hybrid may be between all or part of mouse 
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SOCS 1 and mouse SOCS2. All such hybrids are contemplated herein and are particularly useful 
in developing pleiotropic molecules. 

The present invention further contemplates a range of genetic based diagnostic assays screening 
5 for individuals with defective SOCS genes. Such mutations may result in cell types not being 
responsive to a particular cytokine or resulting in over responsiveness leading to a range of 
conditions. The SOCS genetic sequence can be readily verified using a range of PGR or other 
techniques to determine whether a mutation is resident in the gene. Appropriate gene therapy or 
other interventionist therapy may then be adopted- 

10 

The present invention is further described by the following non-limiting Examples. 
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Examples 1-16 relate to SOCSl, SOCS2 and SOCS3 which were identffied on the basis of activity. 
Examples 17-24 relate to various aspects of S0CS4 to S0CS15 which were cloned initially on the 
basis of sequence similarity. Examples 25-36 relate to specific aspects of SOCS4 to SOCS15, 
respectively, 

5 EXAMPLE 1 

CELL CULTURE AND CYTOKINES 

The Ml cell line was derived firom a spontaneously arising leukaemia in SL mice [Ichikawa, 1969]. 
Parental Ml cells used in this study have been in passage at the Walter and Eliza Hall Institute for 
Medical Research, Melbourne, Victoria, Australia, far approximately 10 years. Ml cells were 

10 maintained by weekly passage in Dulbecco's modified Eagle's medium (DME) containing 10% 
(v/v) foetal bovine serum (FCS). Recombinant cytokines are generally available from commercial 
sources or were prepared by published methods. Recombinant murine LIF was produced in 
Escherichia coli and purified, as previously described [Gearing, 1989]. Purified human oncostatin 
M was purchased from PeproTech Inc (Rocky Hill, NJ, USA), and purified mouse IFN-y was 

15 obtained fi-om Genzyme Diagnostics (Cambridge, MA, USA)- Recombinant murine 
thrortiwpoietin was produced as a FLAGTM-tagged fusion protein in CHO cells and then purified. 

EXAMPLE 2 
AGAR COLONY ASSAYS 
20 In or(to_to assay the differentiation of Ml cells in response to cytokines, 300 cells were cultured 
in 35 mm Petri dishes containing 1 ml of DME supplemented with 20%(y/v) fital calf serum (FCS), 
0.3%(w/v) agar and 0.1 ml of serial dilutions of IL-6, LIF, OSM, IFN-y, tpo or dexamethasone 
(Sigma Chemical Company, St Louis, MI). After 7 days oilture at 37 ''C in a fiilly humidified 
atmospbwe, containing 10% (v/v) CO2 in air, colorues of Ml cells were counted and classified as 
25 differentiated if they were composed of dispersed cells or had a corona of dispersed cells around 
a tightly packed centre. 

EXAMPLE 3 
GENERATION OF RETROVIRAL LIBRARY 

30 A cDNA expression library was constructed from the factor^icpendent haemopoietic cell line 
FDC-Pl, essentially as described [Rayner, 1994]. Briefly, cDNA was cloned into the retroviral 
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vector pRUFneo and then transfected inro an amphotrophic packaging cell line (PA3I7), 
Transiently generated virus was harvested from the cell supernatant at 48 hr posttransfection, and 
used to infect Y2 ecotropic packaging cells, to generate a high litre virus-producing cell line. 

5 EXAMPLE 4 

RETROVIRAL INFECTION OF Ml CELLS 

Pools of ID* infected T2 ceUs were irradiated (3000 rad) and cocultivated with itf Ml cells in 
DME supplemented with 10%(vA^) FCS and 4 {ig/ral Polybrene, for 2 days at 37*'C. To select for 
IL-6-unresponsive clones, retrovirally-infected Ml cells were washed once in DME, and cultured 
10 at approximately 2x10* cells^ml in 1 ml agar cultures containing 400 ug/ml geneticin (GibcoBRL, 
Grand Island, NY) and 100 ng/ml IL-6. The efficiency of infection of Ml cells was 1-2%, as 
estimated by agar plating the infected cells in the presence of geneticin only. 

EXAMPLE 5 

15 PCR 

Genomic DNA from retrovirally-infected Ml cells was digested with Sac I and 1 |Lig of 
phenol/chloroform extracted DNA was then amplified by polymerase chain reaction (PCR). 
Primers used for amplification of cDNA inserts from the integrated retrovirus were GAGS (5' 
CACGCCGCCCACGTGAAGGC 3' [SEQ ID N0:1]), which corresponds to the vector gag 

20 sequence approximately 30 bp 5' of the multiple cloning site, and HSVTK (5' 
TTCGCCAATGACAAGACGCT 3' [SEQ ID NO:2]), which corresponds to the pMClneo 
sequence approximately 200 bp 3' of the multiple cloning site. The PCR entailed an initial 
denaturation at 94''C for 5 min, 35 cycles of denaturation at 94*C for 1 min, annealing at 56*C 
for 2 min, and extension at 72'*C for 3 min, followed by a final 10 min extension. PCR products 

25 were gel purified and then ligated into the pGEM-T plasmid (Promega, Madison, WI). and 
sequenced using an ABI PRISM Dye Terminator Cycle Sequencing Kit and a Model 373 
Automated DNA Sequencer (Applied Biosystems Inc., Foster City, CA). 
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EXAMPLE6 
CLONING OF cDNAs 
Independent cDNA clones encoding mouse SOCS 1 were isolated from a murine thymus cDNA 
library essentially as described (Hilton et al, 1994). The nucleotide and predicted amino acid 
5 sequences of mouse SOCS 1 cDNA were compared to databases using the BLASTN and TFASTA 
algorithms (Pearson and Upman, 1988; Pearson, 1990; Altshcul et al, 1990). Oligonucleotides 
were designed from the ESTs encodmg human SOCSl and mouse SOC-1 and SOCS3 and used 
10 probe commercially available mouse thymus and spleen cDNA Ubraries. Sequencing was 
performed using an ABI automated sequencer according to the manufacturer's instructions. 

10 

EXAMPLE? 

SOUTHERN AND NORTHERN BLOT ANALYSES AND RT-PCR 
"P-labelled probes were generated using a random decanucleotide labelling kit (Bresatec, 
Adelaide, South Australia) from a 600 bp Pst I fragment encoding neomycin phophotransfease 
15 from the plasmid pPGKneo, 1070 bp fragment of the SOCS 1 gene obtained by digestion of the 1 .4 
kbp PGR product with Xho I, S0CS2, S0CS3, CIS and a 1.2 kbp fragment of the chicken 
glyceraldehyde 3-phosphate dehydrogenase gene (Dugaiczyk, 1983]. 

Genomic DNA was isolated from ceUs using a proteinase K-sodium dodecyl sulfate procedure 
20 essentially as described. Fifteen micrograms of DNA was digested with either BamH I or Sac I, 
fractionated on a 0.8%(wAr) agarose gel, transferred to GeneScreenPlus membrane (Du Pont 
NEN, Boston MA), prehybridised, hybridised widi random-primed "P-labelled DNA fragments 
and washed essentially as described [Sambrook, 1989]. 

25 Total RNA was isolated from cells and tissues using Trizol Reagent, as recommended by the 
manufacturer (GibcoBRL.Grand Island, NY). When required poIyA+ mRNA was purified 
essentially as described [Alexander, 1995]. Northern blots were prehybridised, hybridized with 
random-primed 32P-labelled DNA fragments and washed as described (Alexander, 1995]. 

30 To assess the induction of SOCS genes by IL-6, mice (C57BL6) were injected intravenously with 
5 ^^g IL-6 foUowed by harvest of the liver at the indicated timepoints after injectiorL Ml cells were 
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cultured in the presence of 20 ng/ml TL-6 and harvested at the indicated times. For RT-PCR 
analysis, bone mairow cells were harvested as described (Metacalf et al. 1995) and stimulated for 
1 hr at 37'C with 100 ng/ml of a range of cytokmes. RT-PCR was performed on total RNA as 
described (Metcalf er al, 1995). PCR products were resolved on an agarose gel and Southern blots 
5 were hybridised with probes specific for each SOCS family member. Expression of p-actin was 
assessed to ensure uniformity of amplification. 

EXAMPLES 
DNA CONSTRUCTS AND TRANSFECTION 

10 A cDNA encoding epitope-tagged SOCS 1 was generated by subcloning the entire SOCS 1 coding 
region into the pEF-BOS expression vector [Mizushima, 1990], engineered to encode an infirame 
FLAG epitope downstream of an initiation methiomne (pF-SOCSl). Using electroporation as 
described previously [Hilton, 1994], Ml cells expressing the thrombopoietin receptor (Ml.mpl) 
were transfected with the 20 jug of Aat H-digested pF-SOCS 1 expression plasmid and 2 //g of a 

15 Sea I-digested plasmid in which transcription of a cDNA encoding puromycin N-acetyl transferase 
was driven firom the mouse phosphoglycerokinase promoter (pPGKPuropA). After 48 hours in 
culture, transfected cells were selected with 20 pg/ml puromycin (Sigma Chemical Company, St 
Louis MO), and screened for expression of SOCS 1 by Western blotting, using the M2 anti-FLAG 
monoclonal antibody according to the manafacturer's instructions (Eastman Kodak, Rochester 

20 NY)._In other experiments Ml cells were transfected with only die pF-SOCSl plasnoid or a 
control and selected by their ability to grow in agar in the presence of 100 ng/ml of IL-6. 
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EXAMPLE9 

IMMUNOPRECIPITATION AND WESTERN BLOTTING 

Prior to either immunoprecipitaion or Western blotting, 10" Ml cells or their derivatives were 
washed twice, resuspended in Iml of DME, and incubated at 37'C for 30 min. The ceUs were then 
5 stimulated for 4 min at ST^C with either saline or 100 ng/ml 11^6, after which sodium vanadate 
(Sigma Chemical Co., St Louis, MI) was added to a concentration of I mM. Cells were placed 
on ice. washed once with saline containing 1 mM sodium vanadate, and then solubilised for 5 min 
on ice with 300 nl 1% (v/v) Triton X-100, 150 mM Naa 2 mM EDTA, 50 mM Tris-HQ pH 7.4, 
containing Conpteie protease inhibitors (Boehringer Mannheim, Mannheun, Germany) and 1 mM 
10 sodium vanadate. Lysates were cleared by centrifugation and quantitated using a Coomassie 
Protein Assay Reagent (Pierce, Rockford DL). 

For inununoprecipitations. equal concentrations of protein extracts (1-2 mg) were incubated for 
1 hr or ovemighi at 4'C with either 4 fig of anti-gpl30 antibody (M20; Santa Cruz Biotechnology 

15 Inc.. Santa Cruz, CA) or 4 \ig of anti-phosphotyrosine antibody (4G10; Upstate Biotechnology 
Inc., Lake Placid NY), and 15 packed volume of Protein G Sepharose (Pharmacia, Uppsala, 
Sweden) [Hilton erci 1996]. Immunoprecipitates were washed twice in 1% (v/v) NP40, 150 mM 
Naa , 50 mM Tris-HQ pH 8.0, containing Complete protease inhibitors (Boehringer Mannheim, 
Mannheim, Germany and 1 ihM sodium vanadate. The samples were heated for 5 min at QS'C in 

20 SDS _sajjipls buffer (625 mM Tris-HO pH 6.8, 0.05% (w/v) SDS, 0.1% (v/v) glycerol, 
bromophenol blue, 0.125% (v/v) 2-mercaptoethanol), fractionated by SDS-PAGE and 
immunobloned as described above. 

For Western blotting, 10 |ig of protein from a cellular extract or material from an 
25 imraunoprecipitation reaction was toaded onto 4-15% Ready gels (Bio-Rad Laboratories. Hercules 
CA), and resolved by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). 
Proteins were transferred to PVDF membrane (Micron Separations Inc., Westborough MA) for 
1 hr at 100 V. The membranes were probed with the following primary antibodies; anti-tyrosine 
phosphorylated STAT3 (1:1000 dilution; New England Biolabs, Beverly, MA); anti-STAT3 (C-20; 
30 1:100 dihilion; Santa Cruz Biotechnology Inc., Santa Cruz CA); anti-gpl30 (M20, 1: 100 dilution; 
Santa Cruz Biotechnology Inc., Santa Cruz CA); anti-phosphotyrosine (horseradish peroxidase- 
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conjugated RC20, 1:5000 dilution; Transduction Laboratories, Lexington KY); anti-tyroszne 
phosphorylated MAP kinase and anti-MAP kinase antibodies (1:1000 dilution; New England 
Biolabs, Beverly, MA). Blots were visualised using peroxidase-conjugated secondary antibodies 
and Enhanced Chemiluminescence (ECL) reagents according to the manafacturer's instructions 
5 (Pierce, Rockford IL), 

EXAMPLE 10 
ELECTROPHOREHC MOBILITY SHIFT ASSAYS 
Assays were performed as described [Novak* 1995], using the high affinity SIF (c-sis- inducible 

10 fector) binding site n567 [Wakao, 1994], Protein extracts were prepared from Ml cells incubated 
for 4-10 min at 37*C in 10 ml serum-firee DME containing either saline, 100 ng/ml IL-6 or 100 
ng/ml IFN-Y- The binding reactions contained 4-6 ^g protein (constant within a given 
experiment)* 5 ng •'^P-labelled m67 oligonucleotide, and 800 ng sonicated salmon spenn DNA. 
For certain experiments, protein samples were preincubated with an excess of unlabelled m67 

15 oligonucleotide, or antibodies specific for either STATl (Transduction Laboratories, Lexington, 
KY) or STATS (Santa Cruz Biotechnology Inc., Santa Cruz CA), as described [Novak, 1995]. 

Western blots were performed using anti-tyrosine phosphorylated STAT3 or anti-STAT3 (New 
England Biolabs, Beverly, MA) or anti-gpl30 (Santa Cruz Biotechnology Inc.) as described 
20 (Nicola^r al> 1996). EMSA were performed using the m67 oligonucleotide probe, as described 
(Novak era/» 1995). 
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EXAMPLE 11 

EXPRESSION CLONING OF A NOVEL SUPPRESSOR OF 
CYTOKINE SIGNAL TRANSDUCTION 

In order to identify cDNAs capable of suppressing cytokine signal transduction, an expression 
5 cloning approach was adopted This strategy centred on Ml cells, a monocytic leukaeitda cell line 
that differentiates into mature macrophages and ceases proliferation in response to the cytokines 
IL-6, UF, OSM and IFN-Yi and the steroid dexamethasone. Parental Ml cells were infected with 
the RUFneo retrovirus, into which cDNAs from the factor-dependent haemopoietic cell line FDC- 
Pl had been cloned- In this retrovirus, transcription of both the neomycin resistance gene and the 

10 cloned cDNA was driven off the powerful constitutive promoter present in the retroviral LTR 
(Figure 1). When cultured in semi-solid agar, parental Ml cells form large tightly packed colonies. 
Upon stimulation with IL-6, Ml cells undergo rapid differentiation, resulting in the formation in 
agar of only single macrophages or small dispersed clusters of cells . Retrovirally-infected Ml cells 
that were unresponsive to IL-6 were selected in semi-sohd agar culture by their ability to form 

15 large, tightly packed colonies in the presence of IL-6 and geneticin. A single stable IL-6- 
unresponsive clone, 4A2, was obtained after examining 10* infected cells. 

A fragment of the neomycin phosphotransferase (neo) gene was used to probe a Southern blot of 
genomic DNA from clone 4A2 and this revealed that the cell line was infected with a single 

20 retrovirus containing a cDNA approximately 1.4 kbp in length (Figure 2). PGR amplification using 
prinaers from the retroviral vector which flanked the cDNA cloning site enabled recovery of a 1.4 
kbp cDNA insert, which we have named suppressor of cytokine signalIing-1, or SOCSL This PGR 
product was used to probe a similar Southern blot of 4A2 genomic DNA and hybridised to two 
fragments, one which conresponded to the endogenous SOCSl gene and the other, which matched 

25 the size of the band seen using the neo probe, corresponded to the SOCSl cDNA cloned into the 
integrated retroviras (Figure 2). The latter was not observed in an Ml cell clone infected with a 
retrovirus containing an irrelevant cDNA. Similarly, Northern blot analysis revealed that SOCS 1 
mRNA was atnondant in the cell line 4A2, but not in the control infected Ml cell clone (Figure 2). 



30 



P:\OPER\EIH\SOCSJ.P(IV .3i/»y77 



-70- 

EXAMPLE 12 

SOCSl, S0CS2, S0CS3 AND CIS DEFINE A NEW FAMILY 
OF SH2-CONTAINING PROTEINS 

The SOCSl PGR product was used as a probe to isolate homologous cDNAs from a mouse 
5 thymus cDNA library. The sequence of the cDNAs proved to be identical to the PCR product, 
suggesting that constitutive or over expression, rather than mutation, of the SOCSl protein was 
sufiBcient for generating an IL-6-unresponsive phenotype. Comparison of the sequence of SOCS 1 
cDNA with nucleotide sequence daubases revealed that it was present on mouse atid rat genomic 
DNA clones containing the protamine gene cluster found on mouse chromosome 16, Qoser 

10 inspection revealed that the L4 kb SOCSl sequence was not homologous to any of the protamine 
genes, but rather represented a previously unidentified open reading frame located at the extreme 
3' end of these clones (Figure 3). There were no regions of discontinuity between the sequences 
of the SOCSl cDNA and genomic locus, suggesting that SOCSl is encoded by a single exon. In 
addition to the genomic clone containing the protamine genes, a series of murine and hun^ian 

15 expressed sequenced tags (ESTs) also revealed large blocks of nucleotide sequence identity to 
mouse SOCSl. The sequence inforamtion provided by the human ESTs allowed the rapid cloning 
of cDNAs encoding human SOCSl. 

The mouse and rat SOCSl gene encodes a 212 amino acid protein whereas the human SOCSl 
20 gene encodes a 21 1 amino acid protein. Mouse, rat and human SOCS 1 proteins share 95-99% 
amino acid identity (Figure 9). A search of translated nucleic acid databases with the predicted 
amino acid sequence of SOCS 1 showed that it was most related to a recently cloned cytokine- 
inducible imnxdiate early gene product, CIS, and two classes of ESTs. Full length cDNAs from 
the two classes of ESTs were isolated and found to encode proteins of similar length and overall 
25 structure to SOCSl and CIS. These clones were given the names SOCS2 and SOCS3. Each of 
the four proteins contains a central SH2 domain and a C-ierminal region termed the SOCS motif. 
The SOCSl proteins exhibit an extremely high level of amino acid sequence similarity (95-99% 
identity) anwngst different species. However, the forms of the SOCS 1 , SOCS2, S0CS3 and CIS 
from the same animal, while clearly defining a new family of SH2-containing proteins, exhibited 
30 a lower amino acid identity, SCK::S2 and CIS exhibit approximately 38% amino acid identity, while 
the remaining members of the family share approximately 25% amino acid identity (Figure 9). The 



P:\OP£RV£IK\SOCSl PRV- 31/10/97 



-71 - 

coding region of the genes for SOCSl and SOC3 appear to contain no introns while the coding 
region of the genes for S0CS2 and CIS contain one and two introns, respectively. 

The Genbank Accession Numbers for the sequences referred to herein are mouse SOCSl cDNA 
5 (U88325), human SOCSl cDNA (U88326), mouse SOCS2 cDNA (U88327), mouse SOCS3 
cDNA (U88328). 

EXAMPLE 13 

coNsnxtrnvE expression of socsi suppresses the 

10 ACTION OF A RANGE OF CYTOKINES 

To formaliy establish that the phenotype of the 4A2 cell line was directly related to expression of 
SOCSl, and not to unrelated genetic changes which may have occurred independently in these 
cells, a cDNA encoding an epitope-tagged version of SOCSl under the control of the EFla 
promoter was transfected into, parental Ml cells, and Ml cells expressing the receptor for 
15 thrombopoietin, c-nnpl (Ml.nyl). Transfection of the SOCSl e^qpression vector into both cell lines 
resulted in an increase in the frequency of IL-6 unresponsive Ml cells. 

Multipte independent clones of Ml cells expression SOCSl, as detected by Western blot, displayed 
a cytokine-unresponsive phenotype that was indistinguishable from 4A2. Further, if transfectants 
20 were not maintained in puromycin, expression of SOCSl was lost over time and cells regained 
their cytokine responsiveness. In the absence of cytokine, colonies derived from 4A2 and other 
SOCS 1 expressing clones characteristically grew to a smaller size than colones formed by control 
Ml cells (Pigiue 10). 

25 The effect of constitutive SOCSl expression on the response of Ml cells to a range of cytokines 
was investigated using the 4A2 cell line and a clone of MLmpi cells expressing SOCSl 
(Ml.mpl.SOCSl). Unlike parental Ml cells and Ml.n^l cells, the two cell lines expressing 
SOCSl continued to proliferate and foiled to form differentiated colonies in response to either BL- 
6, LIP, OSM, lEN-Y or. in the case of the MLmpLSOCSl cell line, thrombopoietin (Figure 4). 

30 For both cell lines, however, a normal response to dexamethasone was observed, suggesting that 
SOCSl specifically affected cytokine signal transduction rather than differentiation per se. 
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Consistent with these data, while parental Ml cells and Ml.mpI cells became large and vacuolated 
in response to IL-6, 4A2 and Ml.mpl.SOCSl cells showed no evidence of moiphological 
differentiation in response to IL-6 or other cytokines (Figme 5). 

5 EXAMPLE 14 

SOCSl INHIBITS A RANGE OF IL-6 SIGNAL TRANSDUCTION 
PROCESSES, INCLUDING STAT3 PHOSPHORYLATION 
AND ACTIVATION 

Phosphorylation of the cell surface receptor component gpl30, the cytoplasmic tyrosine kinase 
10 JAKl and the transcription factor STAT3 is thought to play a central role in IL-6 signal 
transduction. These events were compared in the parental Ml and Ml.mpl cell Unes and their 
SOCSl-e^ressing counterparts. As expected. gpl30 was phosphorylated rapidly in response to 
IL^ in both parental lines, however, this was reduced five- to ten-fold in the cell lines expressing 
SOCSl (Figure 6). likewise, STAT3 phosphorylation was also reduced by approximately ten-fold 
15 in response to IL-6 in those cell lines expressing SOCSl (Figure 6). Consistent with a reduction 
in STAT3 phosphorylation, activation of specific STAT DNA binding coirq>lexes, as determined 
by electrophoretic mobility shift assay, was also reduced. Notably, there was a reduction in the 
formation of SIF-A (containing STAT3), SIF-B (STAT1/STAT3 heterodimer) and SIF-C 
(containing STATl), the three STAT compkxes induced in Ml cells stimulated with IL-6 (Figure 
20 7). Sinri^ly, constitutive expression of SOCS 1 also inhibited IFN-y-stimulated formation of p9 1 
homodimers (Figure 7). STAT phosphorylation and activation were not the only cytoplasnodc 
processes to be effected by SOCS 1 expression, as the phosphorylation of other proteins, including 
she and MAP kinase, was reduced to a similar extent (Figure 7). 

25 EXAMPLE IS 

TRANSCRIPTION OF THE SOCSl GENE IS STIMULATED BY IL-6 

IN VITRO AND IN VIVO 
Although SOCSl can inhibit cytokine signal transduction when constitutively expressed in Ml 
cells, this does not necessarily indicate that SOCSl nonnally functions to negatively regulate an 
30 IL-6 response. In order to investigate this possibility the inventors determined whether 
transcription of the SOCSl gene is regulated in the response of Ml cells to IL-6 and, because of 
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the critical role rL-6 plays in regulating the acute phase response to injury and infection, the 
response of the liver to intravenous injection of 5 mg 11^6- In the absence of BL-6, SOCSl mRNA 
was undetectable in either Ml cells or in the liver. However, for both cell types, a 1.4 kb SOCS 1 
transcript was induced within 20 to 40 minutes by IL-6 (Figure 8). For Ml cells, where the IL-6 
5 was present throughout the expariment, the level of SOCS 1 mRNA remained elevated (Figure 8). 
In contrast, IL^6 was administered in vivo by a single intravenous injection and was rapidly cleared 
j&om the circulation, resulting in a pulse of IL-6 stimulation to the liver. Consistent with this, 
transient expression of SOCSl mRNA was detectable in the liver, peaking approximately 40 
minutes after injection and declining to basal levels within 4 hours (Figure 8). 

10 

EXAMPLE 16 
REGULATION OF SOCS GENES 

Since CIS was cloned as a cytokine-inducible irmnediate early gene the inventors examined 
15 whether SOCSl, SOCS2 and SOCS3 were similarly regulated. The basal pattern of expression 
of the four SOCS genes was examined by Northem blot analysis of mRNA from a variety of 
tissues from male and female C57B1/6 mice (figure 1 1 A). Constitutive expression of SOCS 1 was 
observed in the thymus and to a lesser extend in the spleen and the lung, SOCS2 expression was 
restricted primarily to the testis and in some animals the liver and lung; for SOCS3 a low level of 
20 expression was observed in the lung, spleen and thymus, while CIS expression was more 
widespread, including the testis, heart, lung, kidney and, in some animals, the liver. 

The inventors sought to determine whether expression of the four SOCS genes was regulated by 
IL-6. Northem blots of mRNA prepared from the livers of untreated and IL-6-injected mice, or 

25 from imstimulated and IL-6-stimulated Ml cells, were hybridised with labelled fragments of 
SOCSl, SOCS2, SOCS3 and CLS cDNAs (Figure 1 IB). Expression of all four SOCS genes was 
increased in the liver following 11^6 injection, however the kinetics of induction appeared to differ. 
Expression of SOCSl and SOCS3 was transient in the liver, with mRNA detectable after 20 
minutes of IL-6 injection and declining to basal levels within 4 hours for SOCS and 8 hours for 

30 SOCS3. Induction of SOCS2 and CIS mRNA in the liver followed similar initial kinetics to that 
of SOCS 1, but was maintained at an elevated level for at least 24 hours. A similar induction of 
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SOCS gene mRNA was observed in other organs, notably the lung and the spleen. In contrast, 
in Ml cells, while SOCSl and CIS mRNA were induced by IL-6, no induction of either S0CS2 
or SOCS3 expression was detected. This result highlights cell type-specific differences in the 
expression of the genes of SOCS family members in response to the same cytokine. 

5 

In order to examine the spectrum of cytokines that was capable of inducing transcription of the 
various members of the SOCS gene family, bone manrow cells were stimulated for an hour with 
a range of cytokines, after which mRNA was extracted and cDNA was synthesised. PGR was then 
used to assess the expression of SOCS 1, SOCS2, SOCS3 and aS (Figure llC). In the absence 

10 of stimulation, little or no expression of any of the SOCS genes was detectable in bone maixow 
by PGR. Stimulation of bone marrow cells with a broad array of cytokines appeared capable of 
up regulating mRNA for one or more members of the SOCS family. IFNy, for example, induced 
expression of aD four SOCS genes, while erythropoietin, granulocyte colony-stimulating factor, 
grarmlocyte-macrophage colony stimulating factor and interleukin-3 induced expression of SOCS2, 

15 SOCS3 and CIS. Interestingly, tunaor necrosis fector alpha, macrophage colony-stimuladng factor 
and inierleukin-U which act through receptors that do not fall into the type I cytokine receptor 
class also appeared capable of inducing expression of SOCS3 and CIS, suggesting that SOCS 
proteins may play a broader role in regulating signal transduction. 

20 As constitutive expression of SOCS 1 inhibited the response of Ml ceDs to a range of cytokines, 
the inventors examined whether phosphorylation of the cell surfece receptor conn$)onent gpl30 and 
the transcr^tion fector STAT3, which are though to play a central role in IL-6 signal transduction, 
were affected. These events were compared in the parental Ml and Ml.mpl cell lines and their 
SOCSl -expressing counterparts. As expected, gpl30 was phyosphorylated rapidly in respotise 

25 to IL-6 in both parental lines, however, this was reduced in the cell lines expressing SOCS 1 (Figure 
12 A). Likewise, STAT3 phosphorylation was also reduced in response to IL-6 in those cell lines 
expressing SOCSl (Figure 12A). Consistent with a reduction in STAT3 phosphorylation, 
activation of specific STAT/DNA binding complexes, as determined by electrophoretic mobility 
shift assay, was also reduced. Notably, there was a failure to form SIF- A (containing STAT3) and 

30 SIF-B(STAT1/STAT3 heterodimer), the major STAT complexes induced in Ml cells stimulated 
with IL-6 (Figure 12B). Similarly, constitutive expression of SOCSl also inhibited IFNy- 
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stimulating formation of SIF-C (STATl homodimer; Figure 12B). These experiments are 
consistent with the proposal that SOCSl inhibits signal transduction upstream of receptor and 
STAT phosphorylation, potentially at the level of the JAK kinases. 

5 The ability of SOCSl to inhibit signal transduction and ultimately the biological response to 
cytokines suggest that, like the SH2-containing phosphatase SHP-1 [Ihle et al, 1994; Yi et al, 
1993], the SOCS proteins may play a central role in controlling the intensity and/or duration of a 
cell's response to a diverse range of extracellular stimuli by suppressing the signal transduction 
process. The evidence provided here indicates that the SOCS family acts in a classical negative 

10 feedback loop for cytokine signal transduction. Like other genes such as OSM, expression of 
genes encoding the SOCS proteins is induced by cytokines through the activation of STATs. Once 
expressed, it is proposed that the SOCS proteins inhibit the activity of JAKs and so reduce the 
phosphorylation of receptors and STATs, thereby suppressing signal transduction and any ensuing 
biological response. Inaportantly, inhibition of STAT activation will, over time, lead to a reduction 

15 in SOCS gene expression, allowing cells to regain responsiveness to cytokines. 

EXAMPLE 17 
DATABASE SEARCHES 

20 The KCBI genetic se^juence database (Genbank), which encompasses the major database of 
expressed sequence tags (ESTs) and TIGR database of human expressed sequence tags, were 
searched for sequences with sinilarity to a concensus SOCS box sequence using the TFASTA and 
MOnF/PATTERN algorithms [Pearson, 1990; Cockwell and Giles, 19891. Using the software 
package SRS [Etzold et al, 1996], ESTs that exhibited similarity to the SOCS box (and their 

25 partoers derived from sequencing the other end of cDNAs) were retrieved and assembled into 
contigs using Autoassembler (Applied Biosystems, Foster City, CA). Consensus nucleotide 
sequences derived from overlapping ESTs were then used to search the various databases using 
BLASTN [Altschul et al 1990]. Again, positive ESTs were retrieved and added to the contig. 
This process was repeated until no additional ESTs could be recovered. Final consensus 

30 nucleotide sequences were then translated using Sequence Navigator (Applied Biosystems, Foster 
City, CA)- 
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The ESTs encoding the new SOCS pxoieins are as follows; human SOCS4 (EST81149. 
EST180909. EST182619, ya99H09. ye70co4, yh53c09. yh77gl 1. yh87h05, yi45h07, yj04e06. 
yql2h06, yq56a06, yq60e02, yq92g03, yq97h06, yr90f01. yt69c03, yv30a08, yv55f07. yv57h09. 
yv87h02. yv98el 1. yw68dlO, yw82a03, yxO8a07, yx72h06. yx76b09, yy37h08. yy66b02, za81fD8, 
5 zbl8«y7, zc06e08. zdl4g06, zd51hl2, zd52b09, ze25gl 1, ze69fD2. zf54«)3. zh96e07, zv66hl2, 
zs83a08 and zs83g08). mouse SOCS-4 (mc65f04, mf42e06, mplOclO. inr8lg09, and mtl9hl2). 
human SOCS-5 (EST15B103, EST15B105, EST27530 and zfSOfOl). mouse SOCS-5 
(mc55a01. mh98f09. my26hl2 and ve24e06). human SOCS-6 (yf61e08. yf93a09. yg05fl2, 
yg41f04, yg45c02, yhlinO. yhl3b05, zc35al2. ze02h08, zl09a03, zl69el0. zn39d08 and 

10 zo39e06). mouse SOCS-6 (mc04c05, md48a03, mf31d03. mh26b07, mh78ell, nih88h09, 
mh94h07, mi27h04 and inj29c05. mp66g04, mw75g03. va53b05, vb34h02, vc55d07, vc59e05, 
vc67d03. vc68dl0. vc97h01. vc99c08. vd07h03, vdOScOl. vd09bl2. vdl9b02, vd29a04 and 
vd46d06). human SOCS-7 (STS WI30171. EST00939. EST1291 3. yc29b05, yp49n0. ztlOfOS 
and zx73g04). mouse SOCS-7 (iry39a01 and ■vi52hD7). mouse SOCS-8 (mj6c09 and vj27a029). 

15 human SOCS-9 (CSRL-82f2-u, ESTl 14054, yy06b07. yy06g06. zr40c09. zr72h01, yx92c08, 
yx93b08 and hfe0662). mouse SOCS-9 (nie65d05). human SOCS-10 (aa48hl0, zp35h01, 
zp97hl2. zqOShOl. zr34g05, EST73000 and HSDHEI005). mouse SOCS-10 (mbl4dl2, 
mb40fl)6, mg89bll, mq89el2. mp03gl2 and vh53cll). human SOCS-11 (zt24h06 and 
zr43b02). human SOCS-13 (EST59161). mouse SOCS-13 (ma39a09, me60c05, mi78g05, 

20 mklOeH. mo48gl2, mp94a01, vb57c07 and vh07cll). human SOCS-14 (mi75e03, vd29hll 
and vd53g07). 

EXAMPLE 18 
cDNA CLONING 

25 

Based on the concensus sequences derived from overlapping ESTs, oligonucleotides were 
designed that were specific to various members of the SOCS family. As described above, 
oligonucleotides were labelled and used to screen conunerically available genomic and cDNA 
Ifljtaries cloned with k bacteriophage. Genomic and/or cDNA clones coveting the entire coding 
30 region of mouse S0CS4. mouse S0CS5 and mouse SOCS6 were isolated. The entire gene for 
SOCS15 is on the human 12pl3 BAG (Genbank Accession Number HSU47924) and the mouse 
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chromosome 6 BAC (Genbank Accession Number AC0Q2393). Partial cDNAs for mouse SOCS7, 
S0CS9, SOCSIO, SOCSU, S0CS12, S0CS13 and S0CS14 were also isolated. 

EXAMPLE 19 

5 NORTHERN BLOTS AND rtPCR 

Northern blots were performed as described above. The sources of hybridisation probes were as 
follows; (i) the entire coding region of the mouse SOCSl cDNA, (u) a 1059 bp PGR product 
derived from coding region of SOCS5 upstream of the SH2 domain, (iii) the entire coding region 
10 of the mouse SOCS6 cDNA, (iv) a 790 bp PGR product derived from the coding region of a 
partial SOCS7 cDNA and (v) a 1200 bp Pst I fragment of the chicken glyceraldehyde 3-phosphaie 
dehydrogenase (GAPDH) cDNA, 

EXAMPLE 20 

1 5 ADDITIONAL MEMBERS OF SOCS FAMILY 

SOCSl, SOCS2 and S0CS3 are members of the SOCS protein family identified in Examples 1-16, 
Each contains a central SH2 domain and a conserved motif at the C-terminus, named the SOCS 
box. In order to isolate further members of this protein family, various DNA databases were 

20 searched wi.th the amino acid sequence corresponding to conserved residues of the SOCS box. 
This search revealed the presence of human and mouse ESTs encoding twelve further members 
of the SOCS protein family (Figure 13). Using this sequence information cDNAs encoding 
SOCS4, SOCS5, SOCS6, SOCS7, SOCS9, SOCSIO, SOCSll, SOCS12, S0CS13, S0CS14 and 
SOCS 15 have been isolated. Further analysis of contigs derived from ESTs and cDNAs revealed 

25 that the SOCS proteins couid be placed into three groups according to their predicted structure 
N-terminal of the SOCS box. The three groups are those with (i) SH2 domains, (ii) WD-40 
repeats and (iii) ankyrin repeats. 



30 



P:\OPER\EJWSOCS1-PRV - 3l/l(V97 



-78- 

EXAMPLE 21 
SOCS PROTEIN WITH SH2 DOMAINS 

Eight SOCS proteins with SH2 domains have been identified. These include SOCS 1, S0CS2 and 
5 SOCS3, SOCS5, S0CS9, SOCSl 1 and SOCS 14 (Figure 13). FuU length cDNAs were isolated 
for mouse SOCS5 and SOCS 14 and partial ctones encoding niouse SOCS9 and SOCS 14. Analysis 
of piimary amino acid sequence and genomic structure suggest that pairs of these proteins (SOCS 1 
and S0CS3, SOCS2 and CIS. S0CS5 and SOCS 14 and SOCS9 and SOCSl 1) are most closely 
related (Figure 13). Indeed, the SH2 domains of S0CS5 and SOCS 14 are almost identical (Hgure 
10 13B), and unlike CIS, SOCSl, S0CS2 and SOCS3. SOCS5 and S0CS14 have an extensive, 
though less well conserved, N-terminal region preceding their SH2 domains (Figure 13A). 

EXAMPLE 22 
SOCS PROTEINS WITH WD-40 REPEATS 

15 

Four SOCS proteins with WD-40 repeats were identified. As with the SOCS proteins with SIC 
domains, pairs of these proteins appeared to be closely related. Full length cDNAs of mouse 
S0CS4 and SOCS6 were isolated and shown to encode proteins containing eight WD-40 repeats 
N-terminal of the SOCS box (Figure 13) and SOCS4 and SOCS6 share 65% amino acid similarity. 

20 SOCS15. was recognised as an open reading frame upon sequencing BACs from human 
chromosome 12pl3 and the syntenic region of mouse chromosome 6 [Ansari-Lari et al, 1997]. 
In the human, chimp and mouse, SOCS 15 is encoded by a gene with two coding exons that Ues 
within a few hundred base pairs of the 3' end of the triose phosphate isomerase (TPI) gene, but 
which is encoded on the opposite strand to TPI (9). In addition to a C-terminal SOCS box, the 

25 S0CS15 protein contains four WD-40 repeats. Interestingly, within the EST databases, there is 
a sequence of a nematode, an insect and a fish relative of SOCS 13. SOCS 15 appears most closely 
related to SOCS13. 
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EXAMPLE23 

SOCS PROTEINS WITH ANKYRIN REPEATS 

Three SOCS proteins with ankyrin repeats were identified. Analysis of partial cDNAs of mouse 
5 S0CS7. SOCS 10 and SOCS 12 demonstrated the presence of multiple ankyrin repeats. 

EXAMPLE 24 
EXPRESSION PATTERN OF SOCS PROTEINS 

10 The expres^on of mRNA from representative members of each class of SOCS proteins - SOCSl 
and SOCS5 from the SH2 domain group, SOCS6 finom the WD-40 repeat group and SOCS7 from 
the ankyrin repeat group was examined. As shown above, SOCS 1 mRNA is found in abundance 
in the thymus and at lower levels in other adult tissues. 

15 Since transcription of the SOCSl gene is induced by cytokines, the inventors sought to determine 
whether levels of SOCS5, SOCS6 and S0CS7 mRNA increased upon cytokine stimulation. In the 
livers of mice injected with IL-6, SOCSl mRNA is detectable aftw 20 min and decreases to 
background levels within 2 hours. In contrast, the kinetics of SOCS5 mRNA expression ate quite 
different, being only detectable 12 to 24 hours after IL-6 injection. SOCS6 mRNA appears to be 

20 express^ constitutively while SOCS7 mRNA was not detected in the liver either before injection 
of IL-6 or at any time after injection. 

Expression of these genes was also examined after cytokine stimulation of the factor-dependent 
cell line HX:P-1 engineered to express bcl-w. Again, while SOCS6 mRNA was expressed 
25 constitutively. 

EXAMPLE 25 
SOCS4 

30 Mouse and human SOCS4 were recognized through searching EST databases using the SOCS box 
consensus (Rgure 13). Those ESTs derived from mouse and human SOCS4 cDNAs are tabulated 
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below (Tables 4.1 and 4.2). Using sequence information derived from mouse ESTs several 
Oligonucleotides were designed and used to screen, in the conventional manner, a mouse thymus 
cDNA library cloned into ^-bacteriophage. Two cDNAs encoding mouse SOCS4 were isolated 
and sequenced in their entirety (Figure 15) and shown to overlap the mouse ESTs identified in 
5 the database (Table 4. 1 and Figure 17). These cDNAs include a region of 5' untranslated region, 
the entire mouse SOCS4 coding region and a region of 3' untranslated region (Figure 17). 
Analysis of the sequence confinm that the SOCS4 cDNA encodes a SOCS Box at its C-terminus 
and a series of 8 WD-40 repeats before the SOCS Box (Figures 1 7 and 16). The relationship of 
the two sequence contigs of human SOCS4 (h4.1 and h4.2) to the experimentally determined 
10 mouse SOCS4 cDNA sequence is shown in Figure 17. The nucleotide sequence of the two 
human contigs is listed in Figure 18. 

SEQ ID NO:13 and 14 represent the nucleotide sequence of murine SOCS4 and the corresponding 
amino add sequence. SEQ ID NOs: 15 and 16 are SOCS4 cDNA human contigs h4.1 and h4.2, 
15 respectively. 

EXAMPLE 26 
SOCS5 

20 Mouseand tuiman S0CS5 were recognized through searching EST databases using the SOCS box 
consensus (Figure 13). Those ESTs derived from mouse and human SOCS5 cDNAs are tabulated 
below (Tables 5.1 and 5.2). Using sequence information derived from mouse and human ESTs, 
several oligonucleotides were designed and used to screen, in die conventional manner, a mouse 
thymus cDNA Hbraiy, a mouse genomic DNA library and a human thymus cDNA library cloned 

25 into J^bacteriophage . A single genomic DNA clone (57-2) and (5-3-2) cDNA clone encoding 
mouse SOCS5 were isolated and sequenced in their entirety and shown to overlap with the mouse 
ESTs identified in die database (Figures 19 and 20A). The entire coding region, in addition to 
a region of 5' and 3" untranslated regions of mouse S0CS5 appears to be encoded on a single 
exon (Figure 19). Analysis of the sequence (Figure 20) confirms that SOCS5 genomic and cDNA 

30 clones encode a protein with a SOCS box at its C-terminus in addition to an SH2 domain (Figure 
19 and 20B). The relationship of the human SOCS5 contig (hS.l; Figure 21) derived from 
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analysis of cDNA clone 5-94-2 and the human S0CS5 ESTs (Table 5.2) to the mouse SOCS5 
DNA sequence is shown in Figure 19. The nucleotide sequence and corresponding amino acid 
sequence of murine SOCS5 are shown in SEQ ID NOs: 17 and 18, respectively. The human 
SOCS5 nucleotide sequence is shown in SEQ ID NO: 19. 

5 

EXAMPLE 27 
SOCS6 

Mouse and human SOCS6 were recognized through searching EST databases using the SOCS box 
10 consensus (Figure 13). Those ESTs derived from mouse and human SOCS6 cDNAs arc tabulated 
below (Tables 6.1 and 6.2). Using sequence information derived from mouse ESTs, several 
oligonucleotides were designed and use to screen, in the conventional manner, a mouse thymus 
cDNA library. Eight cDNA clones (6-1 A, 6-2A, 6-5B, 6-4N, 6-18, 6-29, 6-3N, 6-5N) cDNA 
clone encoding mouse SOCS6 were isolated and sequenced in their entirety and shown to overlap 
15 with the mouse ESTs identified in the database (Figures 22 and 23 A). Analysis of the sequence 
(Figure 23) confirms that the mouse SOCS6 cDNA clones encode a protein with a SOCS box at 
its C-terminus in addition to a eight WEMO r^ats (Figures 22 and 23B). The relationship of the 
human SOCS-6 contigs (h6. 1 and h6.2 ; Figure 24) derived from analysis of human S0CS6 ESTs 
(Table 6.2) to the mouse SOCS6 DNA sequence is shown in Figure 22. The nucleotide and 
20 corresponding amino acid sequences of murine SOCS6 are shown in SEQ ID NOs: 20 and 21, 
respectively, S0CS6 human contigs h6-l and h6.2 are shown in SEQ ID NOs: 22 and 23, 
respectively- 

EXAMPLE 28 
25 SOCS7 

Mouse and human SOCS7 were recognized through seardring EST databases using the SOCS box 
consensus (Figure 13), Those ESTs derived from irouse and human SOCS-7 cDNAs are tabulated 
below (Tables 7.1 and 7.2). Using sequence infonnation derived from mouse ESTs, several 
30 oligonucleotides were designed and use to screen, in the conventional manner, a mouse thymus 
cDNA library. One cDNA clone (74-IOA-l 1) cDNA clone encoding mouse SOCS7 was isolated 
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and sequenced in its entirety and shown to overlap with the mouse ESTs identified in the database 
(Figures 25 and 26A). Analysis of the sequence (Figure 26) suggests that mouse S0CS7 encodes 
a protein with a SOCS box at its C-tenninus, in addition to several ankyrin repeats (Figure 25 and 
26B), The relationship of the human SOCS7 contigs (h7.1 and h7.2 ; Figure 27) derived from 
5 analysis of human SOCS7 ESTs (Table 7.2) to the mouse SOCS7 DNA sequence is shown in 
Figure 25. The nucleotide and corresponding amino acid sequences of murine SOCS7 are shown 
in SEQ ID NOs: 24 and 25, respectively. The nucleotide sequence of S0CS7 human contigs h7. 1 
and h7.2 are shown in SEQ ID NOs: 26 and 27, respectively. 

10 EXAMPLE 29 

SOCS8 

ESTs derived from naouse SOCS8 cDNAs are tabulated below (Table 8.1). As described for other 
members of the SOCS family, it is possible to isolate cDNAs for mouse S0CS8 using sequence 

15 information derived from mouse ESTs. The relationship of the ESTs to the predicted coding 
region of SOCS8 is shown in Figure 28. With the nucleotide sequence obtained from the ESTs 
shown in Figure 29 A and the partial amino acid sequence of SOCS8 shown in Figure 29B. The 
nucleotide sequence and corresponding amino acid sequences for murine SOCS8 are shown in 
SEQ ID NOs:28 and 29, respectively. 

20 . 

EXAMPLE 30 
SOCS9 

Mouse and human SOCS-9 were recognized through searching EST databases using the SOCS 
25 box consensus (Figure 13). Those ESTs derived from mouse and human SOCS9 cDNAs are 
tabulated below (Tables 9. 1 and 9.2). The relationship of the mouse S0CS9 contigs (m9. 1 ; Figure 
9.2) derived from analysis of the mouse SOCS9 EST (Table 9.1) to the human SOCS-9 DNA 
contig (h9.1; Figure 32) derived from analysis of human SOCS9 ESTs (Table 9.2) is shown in 
Figure 31. Analysis of the sequence (Figure 32) indicates that the human SOCS9 cDNA encodes 
30 a protein with a SOCS box at its C-terminus, in addition to an SH2 domain (Figure 30). The 
nucleotide sequence of muring SOCS9 cDNA is shown in SEQ ID NO;30- The nucleotide 
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sequence of human SOCS9 cDNA is shown in SEQ ID NO:3L 

EXAMPLE 31 
SOCSIO 

5 

Mouse and human SOCSIO were recognized through searching EST databases using the SOCS 
box consensus (Figure 13), Those ESTs derived from mouse and human SOCSIO cDNAs are 
tabulated below (Table 10.1 and 10.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen> in the conventional manner, a mouse 

10 thymus cDNA library. Four cDNA clones (10-9, 10-12, 10-23 and 10-24) encoding mouse 
SOCSIO were isolated, sequenced in their entirety and shown to overlap with the mouse and 
human ESTs identified in the database (Figures 33 and 34), Analysis of the sequence (Figure 34) 
indicates that the mouse SOCS 10 cDNA clone is not full length but that it does encode a protein 
with a SOCS box at its C-ienninus, in addition to several ankyrin repeats (Figure 33). The 

15 relationship of the human SOCSIO contigs (hi 0.1 and hlO.2 ; Figure 35) derived from analysis of 
human SOCS 10 ESTs (Table 10,2) to the mouse SOCS 10 DNA sequence is shown in Figure 33. 
Con?5arison of mouse cDNA clones and ESTs with human ESTs suggests that the 3' untranslated 
regions of nfiouse and human SOCSIO differ significantly. The nucleotide sequence of murine 
SOCSIO is shown in SEQ ID NO:32 and the nucleotide sequence of SOCS 10 human condgs hlO. 1 

20 and hJOjZ are shown in SEQ ID NOs:33 and 34, respectively. 

EXAMPLE 32 
SOCSll 

25 Human SOCSll were recognized through searching EST databases using the SOCS box 
consensus (Figure 13), Those ESTs derived from human SOCSl 1 cDNAs are tabulated below 
(Table 11.1 and 11.2). The relationship of the human SOCSll contigs (hlLl; Figure 36A,B). 
derived from analysis ESTs (Table 1 1.2) to the predicted encoded protein, is shown in Figure 37, 
Analy^ of the sequence indicates that the human SOCS 1 1 cDN A encodes a protein with a SOCS 

30 box at its C-terminus, in addition to an SH2 domain (Figure 37 and 36B). The nucleotide 
sequence and corresponding amino acid sequence of human SOCS 1 1 are represented in SEQ ID 
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NOs;35 and 36, respectively. 

EXAMPLE 33 
SOCS12 

5 

Mouse and huinan SOCS-12 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived firom mouse and human SOCS12 cDNAs are 
tabulated below (Tables 12.1 and 12.2), Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and u^ to screen, in the conventional manner, a mouse 

10 thymus cDNA Ubrary. Four cDNA clones (10-9, 10-12, 10-23 and 10-24) encoding mouse 
SOCS 12 were isolated, sequenced in their entirety and shown to overlap with the mouse and 
human ESTs identified in the database (Figures 38 and 39). Analysis of the sequence (Figure 39 
and 40) indicates that the SOCS 12 cDNA clone encodes a protein with a SOCS box at its C- 
terminus, in addition to several ankyrin repeats (Figure 38). The relationship of the human 

15 SOCS12 contigs (hl2.1 and hl2.2 ; Figure 40) derived from analysis of human SOCS12 ESTs 
(Table 12.2) to the mouse SOCS 12 DNA sequence is shown in Figure 38. Con^arison of mouse 
cDNA clones and ESTs with human ESTs suggests that the y untranslated regions of mouse and 
human SOCS 12 differ significantly. The nucleotide sequence of SOCS 12 is shown in SEQ ID 
NO:37. The nucleotide sequence of human SOCS12 contigs hl2.1 and hl2.2 are shown in SEQ 

20 ID NQ5:38 and 39, respectively. 

EXAMPLE 34 
SOCS13 

25 Mouse and human SOCS- 13 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS13 cDNAs are 
tabulated below (Tables 13.1 and 13.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus and a mouse enibryo cDNA library. Three cDNA clones (62-1, 62-6-7 and 62-14) 

30 encoding mouse S0CS13 were isolated, sequenced in their entirety and shown to overlap with the 
mouse ESTs identified in the database (Rgure 41 and 42A). Analysis of the sequence (Figure 42) 
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indicates that the mouse S0CS13 cDNA encodes a protein with a SOCS box at its C-tenninu&, 
in addition to a potential WD-40 repeat (Figure 41 and 42B). The relationship of the human 
S0CS13 contigs (hl3.1 and hl3.2 ; Figure 43) derived from analysis of human SOCS13 ESTs 
(Table 13.2) to the mouse SOCS 13 DNA sequence is shown in Figure 41. The nucleotide 
5 sequence and corresponding amino acid sequence of murine SOCS 13 and shown in SEQ ID 
NOs:40and41, respectively. The nucleotide sequence of human S0CSi3 contig hl3.1 is shown 
in SEQIDNO:42. 

EXAMPLE 35 
10 SOCSX4 

Mouse and human SOCS- 14 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13), Those ESTs derived from mouse and human SOCS 14 cDNAs are 
tabulated below (Tables 14.1 and 14.2). Using sequence information derived from mouse and 

15 human ESTs, several oligonucleotides were designed and use to screen, in the conventional 
manner, a mouse thymus cDNA library, a mouse genomic DNA Ubrary and a human thymus 
cDNA library cloned into ^-bacteriophage . A single genomic DNA clone (57-2) and (5-3-2) 
cDNA clone encoding mouse SOCS 14 were isolated and sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figures 44 and 45 A). The entire 

20 codingjfigion, in addition to a region of 5' and 3' untranslated regions, of mouse SOCS 14 appears 
to be encoded on a single exon (Figure 44), Analysis of the sequence (Figure 45) confirms that 
SOCS 14 genomic and cDNA clones encode a protein with a SOCS box at its C-teraodnus in 
addition to an SH2 domain (Figure 44 and 45B). The relationship of the human SOCS 14 contig 
(hl4.1; Figure 14.3) derived from analysis of cDNA clone 5-94-2 and the human SOCS 14 ESTs 

25 (Table 14.2) to the mouse SOCS 14 DNA sequence is shown in Figure 44. 

The nucleotide sequence and corresponding amino acid sequence of murine SOCS 14 are shown 
in SEQ ID NOs: 43 and 44, respectively. 



30 



P\OPER.\E3HVSOCSl.PRV . 3inrV77 

-86- 

EXAMPLE36 
S0CS15 

Mouse and human SOCS 15 were recognized through searching DNA databases using the SOCS 
5 box consensus (Figure 13). Those ESTs derived from mouse and human SOCS 15 cDNAs are 
tabulated below (Tables 15,1 and 15.2), as are a mouse and human BAC that contain the entire 
mouse and human SOCS- 15 genes. Using sequence information derived from the ESTs and the 
BACs it is possible to predict the entire amino acid sequence of SOCS 15 and as described for the 
other SOCS genes it is feasible to design specific oligonucleotide probes to allow cDNAs to be 

10 isolated- The relationship of the BACs to the ESTs is shown in Figure 46 and the nucleotide and 
predicted amino acid sequence of the SOCS- 15, derived from the mouse and human BACs is 
shown in Figures 47 and 48. The nucleotide sequence and corresponding amino acid sequence of * 
murine SOCS 15 are shown in SEQ ID NOs:46 and 47, respectively. The nucleotide and 
corresponding amino acid sequence of human SOCS 15 are shown in SEQ ID NO;48 and 49, 

15 respectively. 

EXAMPLES? 
SOCS INTERACTION WITH JAK2 KINASE 

20 These Examples show interaction between SOCS and JAK2 kinase. Interaction is mediated via 
the SH2 domain of SOCSl, 2, 3 and CIS. The interaction resulted in inhibition of JAK2 kinase 
activity by SOCSl (Figure 49). General interaction between JAK2 and SOCS 1, 2, 3, and CIS is 
shown in Figure 50. 

25 The following methods are employed: 

Immunoprecipitation: Cos 6 cells were transiently transfected by electroporatioa and culmred 
for 48 hours. Cells were then lysed on ice in lysis buffer (50 mM Tris/HOU pH 7.5, 150 mM 
NaCl, 1% yjy Triton-X-100, 1 mM EDTA, 1 mM Naf, 1 mM Na^VOJ with the addition of 
30 conplete protease inhibitors (Boehringer Mannheim), centrifuged at 4*C (14,000 x g, 10 min) and 
the supernatant retained for imraunoprecipitatioa JAK2 proteins were immunoprecipitated using 
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5 Ail anti-JAK2 antibody (UBI). Antigen-antibody complexes were recovered using protein A- 
Sepharose (30 of a 50% slurry). 

Western blotting: Immunoprecipitates were analysed by sodium dodecyl sulphate (SDS) - 
5 polyacrylamide gel electrophoresis (PAGE) under reducing conditions. Protein was then 
electrophoretically transferred to nitrocellulose, blocked overnight in 10% w/v skim-milk and 
washed in PBS/0.1% v/v Tween-20 (Sigma) (wash buffer) prior to incubation with either anti- 
phosphotyrosine antibody (4G10) (1:5000, UBI), anu-FLAG antibody (1.6 ^g/ml) or anti-JAK2 
antibody (1:2000. UBI) diluted in wash buffer/1% w/v BSA for 2 hr. Nitrocellulose blots were 
10 washed and primary antibody detected with either peroxidase-conjugated sheep anti-rabbit 
immunoglobulin (1:5000. Silenus) or peroxidase-conjugated sheep anti-mouse immunoglobulin 
(1:5000, Silenus) diluted in wash buffer/1% w/v BSA. Blots were washed and antibody binding 
visualised using the enhanced chemiluimnescence (ECL) system (Amersham, UK) according to the 
manufacturers' instructions. 

15 

Jn-vitro kinase assay: An in vitro kinase assy was performed to assess intrinsic JAK2 kinase 
catalytic activity. JAIQ protein were immunopreciptated as described, washed twice in kinase 
assay buffer (50 mM NaCl, 5 mM MgQj, 5 mM Mna2, 1 mM NaF, 1 mM VQ , 10 mM 
HEPES, pH 7.4) and suspended m an equal volume of kinase buffer containing 0.25 fiCi/rdl (y- 
20 "P)-ATP (30 min, room temperature). Excess (y- P)-ATP was removed and die 
immunoprecipitates analysed by SDS/PAGE under reducing conditions. Gels were subjected to 
a mild alkaline hydrolysis by treatment with 1 M KOH (55''C, 2 hours) to remove phosphoserine 
and phosphothreonine. Radioactive bands were visualised witii IMAGEQUANT software on a 
Phosphorlmage system (Molecular Dynamics, Sunnyvale, CA, USA). 

25 

EXAMPLE 38 
MAKING SOCS-1 KNOCKOUT CONSTRUCTS 

Diagrams of plasmid constructs and knockout constructs are shown in Figures 51-53. The 
30 genomic SOCS-1 clone 95-11-10 was digested with tiie restriction enzymes BamHl and EcoRl 
to obtain a 3.6Kb DNA fragment 3' of the coding region (SOCS-1 exon), which was used as the 
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3' arm in the SOCS-1 knockout vectors. The ends of this fragment were then blunted- This 
fragment was then ligated into the following vectors: 
pBgalpAloxNeo 
and pBgalpAloxNeoTK 

5 which had been linearized at the unique Xhol site and then blunted. This ligation resulted in the 
formation of the following vectors: 

3'SOCS- 1 arm in pBgalpAloxNeo 
and 3'SOCS-l arm in pBgaipAloxNeoTK 

10 The 5' arm of the SOCS- 1 knockout vectors was constructed by using PGR to generate a 2.5Kb 
PGR product from the genomic SOGS-1 clone 95-11-10 just 5' of the SOCS-1 coding region 
(SOGS-1 exon). The oligo's used to generate this product were: 
5' oUgo (sense) (2465) 

AGCT AGA TGT GGA CCC TAG AAT GGG AGG [SEQ ID NO:49] 

15 

3' oligo (antisense) (2466) 

AGGT AG ATG TGG CAT CCT ACT GGA GGG GCC AGC TGG [SEQ ID NO:50] 

The PGR product was then digested with the restriction enzyme Bglll, to generate BgUI ends to 
20 the PGR product. This 5' SOCS-1 PGR product,with Bglll, ends was then ligated as follows: 
3'SOCS-l arm in pBgalpAloxNeo and 3'SOCS-l arm in pBgalpAloxNeoTK. which had been 
linearized with the unique restriction enzyme BamHl. This resulted in the following vectors being 
formed: 

5*ife3'SOCS-l arms in pBgalpAloxNeo 
25 and 5'&3'SOCS- 1 arms in pBgalpAloxNeoTK 

These were the final SOCS-1 knockout constructs. Both these constructs lacked the entire SOCS- 
1 coding region (SOCS-1 EXON), being replaced with portions of the Bgal, B globin poly A, PGK 
pronaoter, neomycin and PGK polyA sequences. The 5'&3*SOCS-l arms in pBgalpAloxNeoTK 
30 vector also contained the tynudine kinase gene sequence, between the neomycin and PGK poly A 
sequences. 
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The vectors: 5'&3'SOCS-l arms in pBgalpAIoxNeo 

and 5'&3*SOCS-l arms in pBgalpAloxNeoTK 
were linearized with the unique restriction enzynx Notl and then transfected into Embryonic stenm 
5 cells by electroporation. Clones which were resistant to neomycin were selected and analysed by 
southern blot to determine if they contained the correctly integrated SOCS-1 targeting sequence. 
In order to determine if correct integration had occurred, genomic DNA from the neomycin 
resistant clones was digested with the restriction enzyme EcoRl. The digested DNA was then 
blotted onto nylon filters and probed with a 1.5Kb EcoRl /Hind in DNA fragment, which was 
10 further 5' of the 5'arm sequence used in the knockout constructs. The band sizes expected for 
correct integration were: 

Wild type SOCS-1 allele 5.4Kb 

15 SOCS-1 knockout allele 8.2Kb in 5'&3'SOCS-l arms in pBgalpAIoxNeo 
or 1 1Kb in 5'&3'SOCS-l arms in pBgalpAloxNeoTK transfomcd cells. 

Those skilled in the art will appreciate that the invention described herein is susceptible to 
variations and modifications other than those specifically described. It is to be understood that the 
20 invention includes all such variations and modifications. The invention also includes all of the 
steps, features, compositions and compounds referred to or indicated in this specification, 
individually or collectively, and any and all combinations of any two or more of said steps or 
features. 
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Table4.1 

Summary of ESTs derived from mouse SOCS-4 cDNAs 
SOCS Species EST name End EST no Library source 



Contig 



SOCS-4 Mouse mc65f04 5' 



mf42e06 5' 



EST0549700 dl3.5-14.5 mouse m4.1 
embiyo 



EST0593477 dl3.5-l4.5 mouse 
embryo 



m4.1 



10 



mplOclO 
mr81g09 
mtl9hl2 



5' 
5' 
5' 



EST0747905 d 8.5 mouse embryo ra4.1 
EST0783081 dl3 embryo m4.l 
EST08 16531 spleen m4.1 



15 



Table 4.2 

Summaiy of ESTs derived from human SOCS-4 cDNAs 
20 SOCS Species EST name End EST no Library source Contig 



25 



SOCS-4 Human 27b5 
_^ . 30d2 
J0159F 
J3802F 



5' EST0534081 retina 

5' EST0534315 retina 

5" EST0461188 foetal heart 

5' EST0461428 foetal heart 



30 



EST19523 5' EST0958884 retina 



EST8U49 5' EST1011015 placenta 



35 



EST180909 5' 



EST182619 5' 



EST0951375 Jurkat T- 
lymphocyte 

EST0953220 Jurkat T- 
lymphocyte 



h4.2 
h4.2 
h4.2 
h4.2 
h4.2 
h4.2 
h4.2 

h4.1 



c ^ z 
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10 



15 



20 



25 



30 



35 



ya99h09 
ye70c04 
yh53c09 

yh77gll 

yh87h05 

yi45h07 
yjO4e06 

yql2h06 
yq56a06 
yq60e02 

yq92g03 

yq97h06 

yr90f01 
yt69c03 



3' 
5' 

5' 
3' 

5' 
3' 

5' 

y 



ESTO 103262 placenta 



h4.2 



5' 
3' 

5" 

3' 

5' 
3' 

5" 

3' 

5' 
3' 



EST0172673 foeatl liver/spleen h4.2 



EST0197390 placenta 
ESTO 197391 

EST0203418 placenta 
EST0203419 

EST0204888 placenta 
EST0204773 



5' EST0246604 placenta 



EST0258541 placenta 
EST0258285 



h4.2 
h4.2 

h4.2 
h4.1 

h4.1 
h4.1 

h4.2 

h4.1 
h4.1 



,EST0309968 foetal liver spleen h4.2 

EST0346924 foetal liver spleen h4.2 

EST03472S9 foetal liver spleen h4.2 

EST0347209 h4.2 



EST0355932 foetal liver spleen M.2 

EST0355884 h4.2 

EST0357618 foetal liver spleen h4.2 

EST0357416 h4.2 



5' EST0372402 foetalliver spleen h4.2 

5' EST0338395 foetal liver spleen h4.2 
3' EST0338303 h4.2 



yv30a08 



ESTXMSSSOe foetal liver spleen h4.2 
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yv55f07 5' EST0465391 foetal liver spleen M.2 

y EST046333] h4.2 



yv57h09 

yv87h02 
yv98ell 

yw68dl0 
yw82a03 

yx08a07 
yx72h06 

yx76b09 
yy37h08 
yy66b02 

za8IfD8 
zbl8fD7 
zc06e08 

Zdl4g06 



5' EST0464336 foetalliver spleen h4.2 

3' EST0458765 h4.2 

5' EST0388085 melanocyte h4.2 

5' EST0400679 melanocyte h4.2 

3' EST0400680 h4.2 

5' EST0441370 placenta (8-9 h4.2 
wk) 

5' EST0463005 placenta (8-9 h4.2 
wk) 



3' EST0433678 



h4.1 



3' EST0407016 melanoocyte h4.1 

5' EST0435158 melanoocyte h4.2 

3' EST0422871 melanoocyte h4.I 

5' EST0434011 melanoocyte h4.2 

5' EST0451704 melanoocyte h4.2 

5' EST0505446 multiple h4.2 
sclerosis lesion 

5' EST05 11777 foetal lung h4.2 

3' EST0485315 foetal lung h4.1 

5" EST054(H73 parathyroid M.l 
tumor 

3' EST0540354 h4,l 

3" EST0564666 foetal heart h4.1 
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15 



20 



zd5Ihl2 
zd52b09 

ze25gn 
ze69fD2 

zf54f03 
zh96e07 

zv66hl2 
zs83a08 

zs83g08 



y 
5' 

3' 
3' 

5' 
3' 

5' 
5' 
3' 
5' 
5' 
3' 

5' 
3' 



EST0578099 foetal heart h4.1 

EST0582012 foetal heart h4.1 

EST0581958 h4.1 

EST0679543 foetal heart h4.1 

EST0635563 retina h4,2 

EST0635472 h4.1 

EST06801 1 1 retina h4.2 

EST0616241 foetal liver h4.2 
Spleen 

EST0615745 h4.2 

EST1043265 8-9w foetus h4.2 

EST0920072 germinal centre h4.1 
Bcell 



EST0920016 



h4.1 



EST0920121 germinal centre h4.1 
Bcell 



EST0920122 



h4.1 



25 Table 5.1 

Summary of ESTs derived from mouse SOCS-5 cDNAs 
SOCS Species EST name End EST no 



Library source Contig 



30 SOCS-5 Mouse mc55a01 5' 



EST0541556 dl3.5-14.5 mS.l 
mouse embryo 



mh98f09 



EST0638237 placenta 



ni5.1 



35 



my26hl2 5' 



ve24e06 5' 



EST0859939 mixed organs mS.l 



EST0819106 heart 



mS.l 
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Table 5.2 

Summary of ESTs derived from human SOCS-S cDNAs 



SOCS Species EST name End EST no Library source Contig 

SOCS-5 Human EST15B103 ? EST0258029 adipose tissue h5.1 
EST15B105 ? EST0258028 adipose tissue h5.1 



10 



EST27530 5' EST0965892 cerebellum 



zfSOfOl 



5' EST0679820 retina 



h5.1 



hS.l 



Table 6.1 

15 Summary of ESTs derived from mouse SOCS-6cDNAjs 



20 



25 



30 



35 



SOCS Species EST name 
SOCS-6 Mouse mco4c05 
md48a03 
mf31d03 
mh26b07 

■ mh78ell 

mh88h09 

mh94h07 

mi27h04 

mj29c05 

mp66g04 

mw75g03 



End EST no Library source Contig 

5' EST0525832 dl9.5embiyo m6.1 

5' EST0566730 dl3.5- 14.5 embryo m6.1 

5' EST0675970 dl3.5- 14.5 embryo m6.1 

5' EST0628752 d 13.5- 14.5 placenta m6.1 

5' EST0637608 d 13. 5- 14.5 placenta m6.1 

5* EST0644383 dl3.5-14.5 placenta m6.1 

5' EST0638078 dl3.5-14.5 placenta m6.1 

5' EST0644252 dl3.5-14.5 embryo m6.1 

5" EST0664093 dl3.5-l4.5 embryo m6.1 



5' EST0757905 thymus 



5' EST0847938 Uvcr 



m6.1 



m6.1 



40 
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va53b05 



5' EST0901540 dl2.5 embryo m6.1 



vb34h02 5' EST0930132 lymph node m6.l 



vc55d07 3' EST1057735 2 ceU embryo m6.1 



vc59e05 3' EST1058201 2 cell embryo m6.1 



vc67d03 3' EST1057849 2 ceU embryo m6.1 



vc68dl0 3' EST1058663 2ceIlembiyo m6.1 



vc97h01 3' ESTl 059343 2 cell embryo m6.1 



vc99c08 3' ESTl 059410 2 ceU embryo m6.1 



vd07h03 3' EST1058173 2ceUembiyo m6.1 



vdOScOl 3' EST1058275 2 ceU embryo m6.1 



vd09bl2 3' EST1058632 2 cell embryo m6.1 



vdl9b02 3' EST1059723 2 ceU embryo m6.1 



vd29a04 3' ? none found m6.1 



vd46d06 • 3" ? none found m6.1 
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Table 6.2 

Summary of ESTs derived from human SOCS-5 cDNAs 



socs 



Species £ST name End EST no 



Iiibraxy source 



Contiig 



SOCS -6 Human 



yf61e08 5' 

yf93a09 5' 

Yg05fl2 5' 

yg41f04 5' 

yg45c02 5 ' 

yhllflO 5 ' 

yhl3b05 5 ' 
3 ■ 



35 



2e02h.08 5' 
3 ■ 

2l09a03 5' 
3 ' 

Zl69el0 5' 
zn39d08 5' 
zo39e06 5' 



EST0184387 d73 infant brain h6.1 

EST0186084 6.12 infant brain h6.1 

EST0191486 d73 infant brain h6.1 

EST0195017 d73 infant brain hS.l 

EST0185308 d73 infant brain h6.1 

EST0236705 d73 infant brain h6.1 

EST0237191 d73 infant brain h6.1 



EST0236958 



h6.2 



2c35al2 5" EST0555518 senescent fibroblaiCsl 



EST0603826 foetal heart h6 . 1 

EST0603718 h6-2 

EST0773936 pregnant uterus h6.1 

EST0773892 h6.1 

EST0683363 colon h6 . 1 

EST0718885 endothelial cell hS.l 

EST0785947 endothelial cell h6.1 



40 



Table 7.1 

Summary of ESTs derived from mouse SOCS-7 cDNAs 
SOCS Species EST name End EST no 



Library source Contig 



SOCS-7 Mouse mj39a01 5' EST0665627 dl3.5/14.5 embryo m7.1 
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vi52h07 5' EST 1267404 d7.5 embryo 



m7.1 



10 



Table 7:2 

Summary of ESTs derived from human SOCS-5 cDNAs 

SOCS Species EST name End EST no Library source Condg 

SOCS-7 HUMAN STSWl-30171 (G21563) Chromosome 2 h7.2 

EST00939 5' EST0000906 hippocampus h7.1 



15 



20 



EST12913 
yc29b05 
yp49fl0 
ztlOf03 

zx73g04 



3' EST0944382 utems 



3* EST0128727 liver 



3' EST0301914 retina 



h7.2 



h7.2 



h7.2 



5' BST0922932 germinal centre B cyL2 
3' EST0921231 h7.1 

3' EST1102975 ovarian tumour h7.1 



25 



Tabled 

Siunmary of ESTs derived from mouse SOCS-8 cDNAs 

30 SOCS Species EST name End EST no Library source Contig 

SOCS-8 Mouse mjl6e09 rl EST0666240 dl3.5/14-5 embryo mS.l 

vj27a029 rl EST1155973 heart mS.l 



35 



Table 9.1 

Summary of ESTs derived from mouse SOCS-9 cDNAs 

SOCS Species EST name End EST no Library source Contig 



40 



Mouse me^5d05 



5' EST0585211 d 13.5/14.5 embryo m9.1 
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Table 9 J, 

Summary of ESTs derived from human SOCS-5 cDNAs 



10 



15 



20 



SOCS species EST name 
SOCS-9 Human CSRL-83f2-u 
ESTl 14054 
yy06b07 
yy06g06 
zr40c09 
zr72h01 

yx92c08 



25 — 



yx93b08 



hfe0662 



End EST no Library source Contig 

(B06659) chromsome 11 h9.1 

5' EST0939759 placenta h9.1 

3" EST0434504 melanocyte h9.1 

5' EST0443783 melanocyte h9.1 

5' EST0832461 melanocyte, heart, hfeiis 

5' EST0892025 melanocyte, heart, hfeiis 

3' EST0892026 h9.1 

5' EST0441160 melanocyte h9.1 

5' EST0441260 melanocyte h9.1 

5' EST0889611 foetal heart h9.1 



Table 10.1 

Summary of ESTs derived from mouse SOCS-10 cDNAs 
30 SOCS Species EST name End EST no 



Library source Contig 



35 



Mouse mbl4dl2 5' EST0549887 dl9.5 embryo mlO.l 
mb40f06 5' EST0515064 d 19.5 embryo mlO.l 
mg89bll 5' EST0630631 dl3.5-14.5 embryomlO.l 



mq89el2 5' EST0776015 heart 



mlO.l 
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mp03gl2 5' EST0741991 heart mlO.l 
vh53cll 5' ESTl 154634 mammary gland mlO.l 



Table 10^ 

Summary of ESTs derived from human SOCS-S cDNAs 



10 



SOCS Species EST name End EST no 
SOCS-10 Human aa48hl0 3' EST1135220 



15 



zp35h01 



zp97hl2 



3' EST0819137 



Library source Contig 

germinal centre B cell hi 0.2 
muscle hlO.2 



5' 
3' 



EST0835442 
EST08312n 



muscle 



20 



zq08h01 5' 

zr34g05 5' 
3' 



EST0835907 muscle 



hlO.2 
hlO.2 

hlO.l 



EST0834251 melanocyte, heart, uterMO.2 
EST0834440 hlO.2 



25 



EST73000 5 EST1004491 ovary 



EST0013906 heart 



HSDHEI005 ? 
Table-14.1 

Suramary of ESTs derived from human SOCS-5 cDNAs 



hlO.2 



hlO.2 



SOCS 



Species EST name 



30 



SOCS- 1 1 Human 



zt24h06 
zr43b02 



End EST no Library source 

rl EST0925023 ovarian tumor 

rl EST0873006 melanocyte, heart, uterus 

si EST0872954 



35 Table 12.1 

Summary of ESTs derived from mouse SOCS-12 cDNAs 



SOCS 



Spael«a EST jmmm End BST no 



Llbraxy •oure* 



C<mtig 

11.1 

U.l 
11.1 

Contila 



40 



SOCS-12 Kouso ESTO3803 55 EST1054173 day 7.5 emb fetoplacental mX2.1 

cone 



int:18f02 



EST0817652 3NbM3 spleen 



ml2.i 
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mz60gl0 5' EST0990872 lyn«)h node 

vaOScll 5' EST0909449 lymph node 



Table 12.2 

Summary of ESTs derived from human SOCS-5 cDNAs 



10 socs 



Species EST name 



15 



20 



25 



30 



35 



40 



45 



50 



55 



SOCS-12 Human 



End EST no 



STS-SHGC-13867 

EST177695 5' 

EST64550 5' 

EST7686e 5' 

PMY2369 5' 

yb38f04 5' 



yg74el2 
ylil3g04 

yh48b06 
yh53a05 

yn48h09 

yn90a09 3* 

yo08f03 5' 
3* 

yoXleOl 3' 

yo63bl2 5 ' 
3* 

yg56g02 3* 

zh57c04 3 * 

zhlBhOX 3- 

zh99all 3* 

2o92hl2 5* 



Library source 



Chromosome 2 
EST0948071 Jxirkat cells 
EST0997367 Jurkat cells 
ESTl 007291 pineal body 
EST1X15998 KG-1 
EST0108807 foetal spleen 



EST0224407 

EST023722 6 
EST0236992 

yli48b06 

EST0197282 
EST0197486 

EST0278258 
EST0278259 

EST0302557 

EST0301790 
EST03fl2059 

7 none foiind 

EST0303606 
EST03 04085 

EST0346935 

EST0S94201 

EST0598945 

ESTa618570 

EST08033 92 



d73 brain 
d73 brain 

placenta 
placenta 

brain 

brain 
brain 



breast 

foefcaJ. liver spleen 
foetal liver spleen 
foecal liver spleen 
foetal liver spleen 
ovarian cancer 



Contig 

hl2.2 

hl2,l 

hl2,l 

hl2.2 

hl2.1 

hl2.1 
hl2.2 

hl2.1 

1x12.1 
hl2.2 

hl2.2 

hl2.2 
hl2.2 

hl2.2 
hl2.2 

hl2.2 

hl2.2 
hl2.2 

hl2.2 

hl2.2 
hl2.2 

hl2.1 

hl2.2 

hl2.2 

hl2-2 

hl2.1 
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zs48c01 



5' 
3 ' 



EST0e03393 

EST0925714 
£570925530 



zs45h02 3' EST0932296 

Table 13.1 

Summary of ESTs derived from mouse SOCS-13 cDNAs 



10 socs 



Species EST name 



End ESTtio 



germinal centre B cell 



Library source 



hl2.2 

hl2.1 
hl2,2 



germinal centzre B cell hl2,2 



Contig 



15 



SOCS-13 Mouse ma39c09 



mc60c05 



20 



25 



nii78g05 
mklOcli 
mo48gl2 

vb57c07 
vh07cli 



5' EST05 17875 day 19.5 embryo 

5' EST05S4950 day 13,5/14.5 embryo 

5' EST0653S34 day 19.5 embryo 

5* EST0735158 day 19.5 embryo 

5* EST07451 1 1 day 10.5 embryo 

5* EST0762827 thymus 

5' EST1028976 day 1 1 .5 embiyo 

5' ESTl 1 17269 mammary gland 



ml3.1 
ml3.1 
ml3.1 
ml3.1 
ml3.1 
ml3.1 
ml3.1 
ml3.1 



30 



Table JL3.2 

Summary of ESTs derived nrom human SOCS-13 cDIVAs 



35 



SOC0 



Sod 8ST xu> 



SOCS-13 Human EST591$1 



5' EST099272$ infant brain 



40 Table 14.1 

Summary of ESTs derived from mouse SOCS-14 cDNAs 



SOCS Species EST name 



End EST no 



Library source 



45 



SOCS- 1 4 mouse 



mi75e03 
vd29hl 1 



5^ EST0651892 dl9.5 embryo 
5' ESTl 067080 2 cell embryo 



Contfg 

ml4.1 
ml4,l 



PAOPERVEJH\SCX::SJ.PRv -3i/l(VV7 

- 102- 



vd53g07 



5' ESTI 1 19627 2 cell embryo 



inl4J 



5 Table 15.1 

Summary of ESTs derived ftrom moase SOCS-15 cDNAs 



SOCS Species EST name 



End EST no 



Library source 



10 



SOCS-15 Mouse 



15 



20 



inh29b05 

mh98h09 

ml45a02 

mu43al0 

my38c09 

vj37h07 

AC002393 



5' EST0628834 placenta 

5' EST0638243 placenta 

5' EST0687171 testis 

5^ EST851588 thymus 

5* EST878461 pooled organs 

5' ESTI 174791 diaphragm 

Chromosome 6 B AC 



25 



Table IS 

Summary of ESTs derived from human S0CS*15 cDNAs 



30 SOCS Species 
SOCS-15 Human 

35 



EST name 

EST98889 

nc48bo5 

ybl2hl2 

HSU47924 



End EST no Library source 

5' ESTI 026568 thyroid 

3* ESTn38057 colon tumour 



3* 



EST0098885 placenta 
EST0098886 



Chromosome 12 BAC 



Contig 

ml5.1 
ml5.1 
ml5.1 
mlS.l 
ml5.1 
ml5.1 
mlS.l 



Contig 

hlS.l 

hlS.l 

hl5.1 
hlS.l 

hl5.1 



40 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPUCANT; (Other than US) AMRAD OPERATIONS PTY LTD 

(US Only) 

(ii) TITLE OF INVENTION; THERAPEUTIC AND DIAGNOSTIC AGENTS 

(iii) NUMBER OF SEQUENCES: 49 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DAVEES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRAUA 

(F) ZIP: 3000 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: PatentIn Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPUCATION NUMBER: PCT INTERNATIONAL 

(B) FILING DATE: 31-OCT-1997 

(vi>JSRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P051 17 

(B) FILING DATE: 14-FEB-1997 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PO 3384 

(B) FILING DATE: Ol-NOV-1996 

(viii) ATTORNEY/AGENT INFORMATION: 
(A) NAME: HUGHES DR. E JOHN L 

(C) REFERENCEA>OCKET NUMBER: EJH/EK 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: +61 3 9254 2777 

(B) TELEFAX: +61 3 9254 2770 
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(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 



CACGCCGCCC ACGTGAAGGC 



(2) INFORMATION FOR SEQ XD NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C ) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 
TTCGCCAATG ACAAGACGCT 20 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A3 LENGTH: 1236 base pairs 
(B) TYPE: nucleic acid 
(C> STRANDEDNESS : single 
(D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA 

TTx) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 63 6 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



CGAGGCTCAA GCTCCGGGCG GATTCTGCGT GCCGCTCTCG CTCCTTGGGG TCTGTTGGCC 
GGCCTGTGCC ACCCGGACGC CCGGCTCACT GCCTCTGTCT CCCCCATCAG CGCAGCCCCG 
GACGCTATGG CCCACCCCTC CAGCTGGCCC CTCGAGTAGG 



ATG GTA GCA CGC AAC CAG GTG GCA GCC GAC AAT GCG ATC TCC CCG GCA 
Met Val Ala Arg Asn Gin val Ala Ala Asp Asn Ala He Ser Pro Ala 
15 10 15 

GCA GAG CCC CGA CGG CGG TCA GAG CCC TCC TCG TCC TCG TCT TCG TCC 
Ala Glu Pro Arg Arg Arg Ser Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

TCG CCA GCG GCC CCC GTG CGT CCC CGG CCC TGC CCG GCG GTC CCA GCC 



3 ■ - G-S7 : ' 6 : 4S 
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Ser Pro Ala Ala Pro Val Arg Pro Arg Pro Cys Pro Ala Val Pro Ala 

35 40 45 

CCA GCC CCT GGC GAG ACT CAC TTC CGC ACC TTC CGC TCC CAC TCC GAT 192 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser Kis Ser Asp 

50 SB 60 



TAC CGG CGC ATC ACG CGG ACC AGC GCG CTC CTG GAC GCC TGC GGC TTC 
Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 



240 



288 



TAT TGG GGA CCC CTG AGC GTG CAC GGG GCG CAC GAG CGG CTG CGT GCC 
Tvr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala 
85 90 95 

GAG CCC GTG GGC ACC TTC TTG GTG CGC GAC AGT CGT CAA CGG AAC TGC 336 
Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 110 

TTC TTC GCG CTC AGC GTG AAG ATG OCT TCG GGC CCC ACG AGC ATC CGC 384 
Phe Phe Ala Leu Ser Val Lys Met: Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

GTG CAC TTC CAG GCC GGC CGC TTC CAC TTG GAC GGC AGC CGC GAG ACC 432 
Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Thr 
130 135 140 

TTC GAC TGC CTT TTC GAG CTG CTG GAG CAC TAC GTG GCG GCG CCG CGC 480 
Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

CGC ATG TTG GGG GCC CCG CTG CGC CAG CGC CGC GTG CGG CCG CTG CAG 528 
Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

GAG CTG TCT CGC CAG CGC ATC GTG GCC GCC GTG GGT CGC GAG AAC CTG 57 S 

Glu Leu cys Arg Gin Arg He Val Ala Ala Val Gly Arg Glu Asn Leu 
180 195 190 

GCG CGC ATC CCT CTT AAC CCG GTA CTC CGT GAC TAC CTG AGT TCC TTC 624 
Ala Arg He Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 

CCC TTC CAG ATC TGA CCGGCTG CCGCTGTGCC GCAGCATTAA GTGGGGGCGC 676 
Pro Phe Gin He * 
210 

CTTATTATTT CTTATTATTA ATTATTATTA TTTTTCTGGA ACCACGTGGG AGCCCTCCCC 73 6 

GCCTGGGTCG GAGGGAGTGG TTGTGGAGGG TGAGATGCCT CCCACTTCTG GCTGGAGACC 796 

TCATCCCACC TCTCAQGGGT GGGGGTGCTC CCCTCCTGGT OCTCCCTCCG GGTCCCCCCT 856 

GGTTGTAGCA GCTTGTGTCT GGGGCCAGGA CCTGAATTCC ACTCCTACCT CTCCATGTTT 916 

ACATATTCCC AGTATCTTTG CACAAACCAG GGGTCGGGGA GGGTCTCTGG CTTCATTTTT 976 

CTGCTGTGCA GAATATCCTA TTTTATATTT TTACAGCCAG TTTAGGTAAT AAACTTTATT 1036 

ATGAAAGTTT TTTTTTAAAA GAAAAAAAAA AAAAAAAAA 1075 



(2) INFORMATION FOR SEQ ID NO: 4: 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 212 amino acids 



- 110- 



(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

fii) MOLECULS TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 



Met val Ala Arg Asn Gin val Ala Ala Asp Asn Ala He ser Pro Ala 
15 10 15 

Ala Glu Pro Arg Arg Arg Ser Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Ser Pro Ala Ala Pro Val Arg Pro Arg Pro Cys Pro Ala Val Pro Ala 
35 40 45 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

Tyr Arg Arg He Thr Arg Thr Ser Ala X-eu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala 
85 90 95 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp ser Arg Gin Arg Asn Cys 
100 105 110 

Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser He Arg 
115 120 125 

Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Thr 
130 135 140 

Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

Glu Leu Cys Arg Gin Arg He Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 

Ala Arg 11« Pro Lgu Asn Pro Val Leu Arg Asp Tyr Leu Ssjt Ser Phe 
195 200 205 

Pro Phe Gin He 
210 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 1121 base pairs 
(B) TYPE: nucleic acid 
<C) STRANDEUNESS; single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(iX) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 223.. 819 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GCGATCTGTG GGTGACAGTG TCTGCGAGAG ACTTTGCCAC ACCATTCTGC CGGAATTTGG 60 
AGAAAAAGAA CCAGCCGCTT CCAGTCCCCT CCCCCTCCGC CACCATTTCG GACACCCTGC 120 
ACACTCTCGT TTTGGGGTAC CCTGTGACTT CCAGGCAGCA CGCGAGGTCC ACTGGCCCCA 180 
GCTCGGGCGA CCAGCTGTCT GGGACGTGTT GACTCATCTC CC ATG ACC CTG CGG 234 

1 

TGC CTG GAG CCC TCC GGG AAT GGA GCG GAG AGG ACG CGG AGC CAG TGG 282 
Cys Leu Glu Pro Ser Gly Asn Gly Ala Asp Arg Thr Arg Ser Gin Trp 
5 10 15 20 

GGG ACC GCG GGG TTG CCG GAG GAA CAG TCC CCC GAG GCG GCG CGT CTG 33 0 

Glv Thr Ala Gly I*eu Pro Glu Glu Gin Ser Pro Glu Ala Ala Arg Leu 
25 30 

GCG AAA GCC CTG CGC GAG CTC AGT CAA ACA GGA TGG TAG TGG GGA AGT 378 
Ala I>ys Ala Leu Arg Glu Leu Ser Gin Thr Gly Trp Tyr Trp Gly Ser 
40 45 50 

ATG ACT GTT AAT GAA GCC AAA GAG AAA TTA AAA GAG GCT CCA GAA GGA 42 6 

Met Thr Val Asn Glu Ala Lys Glu Lys Leu Lys Glu Ala Pro Glu Gly 
55 60 65 

ACT TTC TTG ATT AGA GAT AGT TCG CAT TCA GAC TAG CTA CTA ACT ATA 474 
Thr Phe Leu He Arg Asp Ser Ser His Ser Asp Tyr Leu Leu Thr lie 
70 75 80 

TCC GTT AAG ACG TCA GCT GGA CCG ACT AAC CTG CGG ATT GAG TAG CAA 522 
Ser Val Lys Thr Ser Ala Gly Pro Thr Asn Leu Arg He Glu Tyr Gin 
85 90 55 100 

GAT GGG AAA TTC AGA TTG GAT TCT ATC ATA TGT GTC AAG TCC AAG CTT 570 
Asp Gly Lys Phe Arg Leu Asp Ser He He Cys Val Lys Ser Lys Leu 
*^ 105 110 1-15 

AAA CAG TTT GAC AGT GTG GTT CAT CTG ATT GAC TAG TAT GTC CAG ATG 618 
Lvs Gin Phe Asp Ser val Val His Leu He Asp Tyr Tyr Val Gin Met 
120 125 130 

TGC AAG GAT AAA CGG ACA GGC CCA GAA GCC CCA CGG AAT GGG ACT GTT €66 
Cys Lys Asp Lys Arg Thr Gly Pro Glu Ala Pro Arg Asn Gly Thr Vel 
135 140 145 

CAC CTG TAG CTG ACC AAA CCT CTG TAT ACA TCA GCA CCC ACT CTG CAG 714 
His Leu Tyr Leu Thr Lys Pro Leu Tyr Thr Ser Ala Pro Thr Leu Gin 
150 155 160 

CAT TTC TGT CGA CTC GCC ATT AAC AAA TGT ACC GGT ACG ATC TGG GGA 762 
His Phe Cys Arg Leu Ala He Asn Lys Cys Thr Gly Thr He Trp Gly 
165 170 175 180 

CTG CCT TTA CCA ACA AGA CTA AAA GAT TAC TTG GAA GAA TAT AAA TTC 810 
Leu Pro Leu Pro Thr Arg Leu Lys Asp Tyr Leu Glu Glu Tyr Lys Phe 
185 190 195 

CAG GTA TAAGTATTTC TCTCTCTTTT TCGTTTTTTT TTAAAAAAAA AAAAACACAT 866 
Gin Val 

GCCTCATATA GACTATCTCC GAATGCAGCT ATGTGAAAGA GAACCCAGAG GCCCTCCTCT 926 

GGATAACTGC GCAGAATTCT CTCTTAAGGA CAGTTGGGCT CAGTCTAACT TAAAGGTGTG 986 
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AAGATGTAGC TAGGTATTTT AAAGTTCCCC TTAGGTAGTT TTAGCTGAAT GATGCTTTCT 1046 
TTCCTATGGC TGCTCAAGAT CAAATGGCCC TTTTAAATGA AACAAAACAA AACAAAACAA 110^ 
AAAAAAAAAA AAAAA 1121 

(2) INFOPKATIQN FOn SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 amino acids 

(B) TYPE: amino acid 
(D) TOFOhOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTrON: SEQ ID NO: 6: 

Met Thr Leu Arg Cys Leu Glu Pro Ser Gly Asn Gly Ala Asp Arg Thr 
15 10 15 

Arg Ser Gin Trp Gly Thr Ala Gly Leu Pro Glu Glu Gin Ser Pro Glu 
20 25 30 

Ala Ala Arg Leu Ala Lys Ala Leu Arg Glu Leu Ser Gin Thr Gly Trp 
35 40 45 

Tyr Trp Gly Ser Mec Thr Val Asn Glu Ala Lys Glu Lys Leu Lys Glu 
50 55 60 

Ala Pro Glu Gly Thr Phe Leu lie Arg Asp Ser Ser His Ser Asp Tyr 
65 70 75 80 

Leu Leu Thr lie Ser Val Lys Thr Ser Ala Gly Pro Thr Asn Leu Arg 
85 90 95 

lie Glu Tyr Gin Asp Gly Lys Phe Arg Leu Asp Ser lie lie Cys Val 
100 105 110 

Lys Ser Lys Leu Lys Gin Phe Asp Ser Val Val His Leu lie Asp Tyr 
115 120 125 

Tyr Ve^r Gin Met Cys Lys Asp Lys Arg Thr Gly Pro Glu Ala Pro Arg 
130 135 140 

Asn Gly Thr val His Leu Tyr Leu Thr Lys Pro Leu Tyr Thr Ser Ala 
145 150 155 160 

Pro Thr Leu Gin His Phe Cys Arg Leu Ala lie Asn Lys Cys Thr Gly 
165 170 175 

Thr lie Trp Gly Leu Pro Leu Pro Thr Arg Leu Lys Asp Tyr Leu Glu 
180 185 190 

Glu Tyr Lys Phe Gin Val 
195 

(2) INFORMATION FOR SEQ ID N0:7: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 2197 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 18.. £95 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGCTGGCTCC GTGCGCC ATO GTC ACC CAC AGO AAG TTT CCC GCC GCC GGG 
Met Val Thr His Ser Lys Phe Pro Ala Ala Gly 
15 10 

ATG AGC CGC CCC CTG GAC ACC AGC CTG CGC CTC AAG ACC TTC AGC TCC 
Met Ser Arg Pro Leu Asp Thr Ser Leu Arg Leu Lys Thr Phe Ser Ser 
15 20 25 

AAA AGC GAG TAC CAG CTG GTG GTG AAC GCC GTG CGC AAG CTG CAG GAG 
Lys Ser Glu Tyr Gin Leu Val Val Asn Ala Val Arg Lys Leu Gin Glu 
30 35 40 

AGC GGA TTC TAC TGG AGC GCC GTG ACC GGC CGC GAG GCG AAC CTG CTG 
ser Gly Phe Tyr Trp Ser Ala Val Thr Gly Gly Glu Ala Asn Leu Leu 
45 50 55 

CTC AGC GCC GAG CCC GCG GGC ACC TTT CTT ATC CGC GAC AGC TCG GAC 
Leu Ser Ala Glu Pro Ala Gly Thr Phe Leu lie Arg Asp Ser Ser Asp 
60 65 70 75 

CAG CGC CAC TTC TTC ACG TTG AGC GTC AAG ACC CAG TCG GGG ACC AAG 
Gin Arg His Phe Phe Thr Leu Ser Val Lys Thr Gin Ser Gly Thr Lys 
80 85 90 

AAC CTA CGC ATC CAG TGT GAG GGG GGC AGC TTT TCG CTG CAG AGT GAC 
Asn Leu Arg lie Gin Cys Glu Gly Gly Ser Phe Ser Leu Gin Ser Asp 
95 100 105 

CCC CGA AGC ACG CAG CCA GTT CCC CGC TTC GAC TGT GTA CTC AAG CTG 
Pro Arg Ser Thr Gin Pro Val Pro Arg Phe Asp Cys Val Leu Lys Leu 
110 115 120 

GTG CAC CAC TAC ATG CCG CCT CCA GGG ACC CCC TCC TTT TCT TTG CCA 
Val His His Tyr Met Pro Pro Pro Gly Thr Pro Ser Phe Ser Leu Pro 

130 135 

CCC ACG GAA CCC TCG TCC GAA GTT CCG GAG CAG CCA CCT GCC CAG GCA 
Pro Thr Glu Pro Ser Ser Glu Val Pro Glu Gin Pro Pro Ala Gin Ala 
140 145 150 155 

CTC CCC GGG AGT ACC CCC AAG AGA GCT TAC TAC ATC TAT TCT GGG GGC 
Leu Pro Gly Ser Thr Pro Lys Arg Ala Tyr Tyr lie Tyr Ser Gly Gly 
160 165 170 

GAG AAG ATT CCG CTG GTA CTG AGC CGA CCT CTC TCC TCC AAC GTG GCC 
Glu Lys lie Pro Leu Val Leu Ser Arg Pro Leu Ser Ser Asn Val Ala 
175 180 185 

ACC CTC CAG CAT CTT TGT CGG AAG ACT GTC AAC GGC CAC CTG GAC TCC 
Thr Leu Gin His Leu Cys Arg Lys Thr Val Asn Gly His Leu Asp Ser 
190 195 200 

TAT GAG AAA GTG ACC CAG CTG CCT GGA CCC ATT CGG GAG TTC CTG GAT 
Tyr Glu Lys vaI Thr Gin Leu Pro Gly Pro lie Arg Glu Phe Leu Asp 
205 210 215 

CAG TAT GAT GCT CCA CTT TAAGGAGCAA AAGGGTCAGA GGGGGGCCTG 
Gin Tyr Asp Ala Pro Leu 
220 225 
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GGTCQGTCCa TCGCCTCTCC TCCGAGGCAC ATGGCACAAG CACAAAAATC CAGCCCCAAC 782 

GGTCGGTAGC TCCCAGTGAG CCAGGGGCAG ATTGGCTTCT TCCTCAGGCC CTCCACTCCC 842 

GCAGAGTAGA GCTGGCAGGA CCTGGAATTC GTCTGAGGGG AGGGGGAGCT GCCACCTOCT 902 

TTCCCCCCTC CCCCAGCTCC AGCTTCTTTC AAGTGGAGCC AGCCGGCCTG GCCTGGTGGG 962 

ACAATACCTT TGACAAGCGG ACTCTCCCCT CCCCTTCCTC CACACCCCCT CTGCTTCCCA 1022 

AGGGAGGTGG GGACACCTCC AAGTGTTGAA CTTAGAACTG CAAGGGGAAT CTTCAAACTT 1082 

TCCCGCTGGA ACTTGTTTGC GCTTTGATTT GGTTTGATCA AGAGCAGGCA CCTCGGGGAA 1142 

GGATGGAAGA GAAAAGGGTG TGTGAAGGGT TTTTATGCTG GCCAAAGAAA TAACCACTCC 1202 

CACTGCCCAA CCTAGGTGAG GAGTGGTGGC TCCTGGCTCT GGGGAGAGTG GCAAGGGGTG 1262 

ACCTGAAGAG AGCTATACTG GTGCCAGGCT CCTCTCCATG GGGCAGCTAA TGAAACCTCG 1322 

CAGATCCCTT GCACCCCAGA ACCCTCCCCG TTGTGAAGAG GCAGTAGCAT TTAGAAGGGA 13 82 

GACAGATGAG GCTGGTGAGC TGGCCGCCTT TTCCAACACC GAAGGGAGGC AGATCAACAG 1442 

ATGAGCCATC TTGGAGCCCA GGTTTCCCCT GGAGCAGATG GAGGGTTCTG CTTTGTCTCT 1502 

CCTATGTGGG GCTAGGAGAC TCGCCTTAAA TGCCCTCTGT CCCAGGGATG GGGATTGGCA 1562 

CACAAGGAGC CAAACACAGC CAATAGGCAG AGAGTTGAGG GATTCACCCA GGTGGCTACA 1622 

GGCCAGGGGA AGTOGCTGCA GGGGAGAGAC CCAGTCACTC CAGCAGACTC CTGAGTTAAC 1682 

ACTGGGAAGA CATTGGCCAG TCCTAGTCAT CTCTCGGTCA GTAGGTCCGA GAGCTTCCAG 1742 

GCCCTGCACA GCCCTCCTTT CTCACCTGGG GGGAGGCAGG AGGTGATGGA GAAGCCTTCC 1802 

CATGCCGCTC ACAGGGGCCT CACGGGAATG CAGCAGCCAT GCAATTACCT GGAACTGGTC 1862 

CTGTGTTGGG GAGAAACAAG TTTTCTGAAG TCAGGTATGG GGCTGGGTCG GGCAQCTGTG 1922 

TGTTGGGGTG GCTTTTTTCT CTCTGTTTTG AATAATGTTT ACAATTTGCC TCAATCACTT 1982 

TTATAftfiAAT CCACCTCCAG CCCGCCCCTC TCCCCACTCA GGCCTTCGAG GCTGTCTGAA 2042 

GATGCTTGAA AAACTCAACC AAATCCCAGT TCAACTCAGA CTTTGCACAT ATATTTATAT 2i02 

TTATACTCAG AAAAGAAACA TTTCAGTAAT TTATAATAAA AGAGCACTAT TTTTTAATGA 2152 

AAAAAAAAAA AAAAAAAAAA AAAAA 2187 

(2) IKFORMATION FOR SEQ ID NO: 8: 

{i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 225 amino acids 
(B) TVPE: amino acid 

(Di TOPOLOGY: liMar 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTIONT: SEQ ID NO; 8: 

Met Val Thr His Ser Lys Phe Pro Ala Ala Gly Met Ser Arg Pro Leu 
15 10 15 

Asp Thr Ser Leu Arg Leu Lys Thr Phe Ser Ser Lys Ser Glu Tyr Gin 
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Leu Val Val Asn Ala Val Arg Lys Leu Gin Glu Ser Gly Phe Tyr Trp 
35 40 45 

Sex Ala Val Thr Gly Gly Glu Ala Asn Leu Leu Leu Ser Ala Glu Pro 
50 55 60 

Ala Gly Thr Phe Leu lie Arg Asp Ser Ser Asp Gin Arg His Phe Phe 
65 70 75 80 

Thr Leu Ser Val Lys Thr Gin Ser Gly Thr Lys Asxi Leu Arg He Gin 
S5 90 95 

Cys Glu Gly Gly Ser Phe Ser Leu Gin Ser Asp Pro Arg Ser Thr Gin 
100 105 110 

Pro Val Pro Arg Phe Asp Cys Val Leu Lys Leu Val His His Tyr Met 
115 120 125 

Pro Pro Pro Gly Thr Pro Ser Phe Ser Leu Pro Pro Thr Glu Pro Ser 
130 135 140 

Ser Glu Val Pro Glu Gin Pro Pro Ala Gin Ala Leu Pro Gly Ser Thr 
145 150 155 160 

Pro Lys Arg Ala Tyr Tyr He Tyr Ser Gly Gly Glu Lys He Pro Leu 
165 170 175 

Val Leu Ser Arg Pro Leu Ser Ser Asn val Ala Thr Leu Gin His Leu 
180 185 190 

Cys Arg Lys Thr Val Asn Gly His Leu Asp Ser Tyr Glu Lys Val Thr 
195 200 205 

Gin Leu Pro Gly Pro He Arg Glu Phe Leu Asp Gin Tyr Asp Ala Pro 
210 215 220 

Leu 
225 



C2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A> LENGTH: 1094 base pairs 
(B) TYPE: nucleic acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



CTCCGGCTGO CCCCTTCTGT AGGATGGTAG CACACAACCA QGTGGCAGCC GACAATGCAG 60 

TCTCCACAGC AGCAGAGCCC CGACGGCGGC CAGAACCTTC CTCCTCTTCC TCCTCCTCGC 120 

CCGCGGCCCC CGCGCGCCCG CGGCCGTGCC CCGCOGTCCC GGCCCCGGCC CCCGGCGACA 180 

CGCACTTCCG CACATTCCGT TCGCACGCCG ATTACCGGCG CATCACGCGC GCCAGCGCGC 240 

TCCTGGACGC CTGCGGATTC TACTGGGGGC CCCTGAGCGT GCACGGGGCG CACGAGCGGC 300 

TGCGCGCCGA GCCCGTGGGC ACCTTCCTGG TGCGCGACAG CCGCCAGCGG AACTGCTTTT 360 

TCGCCCTTAG CGTGAAGATG GCCTCGGGAC CCACGAGCAT CCGCGTGCAC TTTCAGGCCG 420 

GCCGCTTTCA CCTGGATGGC AGCCGCGAGA GCTTCGACTG CCTCTTCGAG CTGCTGGAGC 480 
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ACTACGTGGC GGCGCCGCGC CGCATGCTGG GGGCCCCGCT GCGCCAGCGC CGCGTGCGGC 540 

CGCTGCAGGA GCTGTGCCGC CAGCGCATCG TGGCCACCGT GGGCCGCGAG AACCTGGCTC 600 

GCATCCCCCT CAACCCCGTC CTCCGCGACT ACCTGAGCTC CTTCCCCTTC CAGATTTCAC €60 

CGGCAGCGCC CGCCGTGCAC GCAGCATTAA CTGGGATGCC GTGTTATTTT GTTATTACTT 720 

GCCTGGAACC ATGTGGGTAC CCTCCCCGGC CTGGGTTGGA GGGAGCGGAT GGGTGTAGGO 780 

GCGAGOCGCC TCCCGCCCTC GGCTCGAGAC QAGGCCGCAG ACCCCTTCTC ACCTCTTGAG 840 

GGGGTCCTCC CCCTCCTGGT GCTCCCTCTG GGTCCCCCTG GTTGTTGTAG CAGCTTAACT 900 

GTATCTGGAG CCAGGACCTG AACTCGCACC TCCTACCTCT TCATGTTTAC ATATACCCAG 960 

TATCTTTGCA CAAACCAGGG GTTGGGGGAG GGTCTCTGGC TTTATTTTTC TGCTGTGCAG 1020 

AATCCTATTT TATATTTTTX AAAGTCAGTT TAGGTAATAA ACTTTATTAT GAAAGTTTTT 1080 

TTTTTTAAAA AAAA 1094 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION; SEQ ID NO: 10: 

Met val Ala His Asn Gin Val Ala Ala Asp Asn Ala Val Ser Thr Ala 
-IT" 5 10 15 

Ala Glu Pro Arg Arg Arg Pro Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Pro Ala Ala Pro Ala Arg Pro Arg Pro Cys Pro Ala Val Pro Ala Pro 
35 40 45 

Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ala Asp Tyr 
50 55 60 

Arg Arg lie Thr Arg Ala Ser Ala Leu Leu Asp Ala Cys Gly Phe Tyr 
65 70 75 80 

Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala Glu 

85 90 gs 
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Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys Phe 
100 105 110 

Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg Val 
115 120 125 

His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Ser Phe 
130 135 140 

Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg Arg 
145 150 155 160 

Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin Glu 
165 170 175 

Leu Cys Arg Gin Arg lie Val Ala Thr Val Gly Arg Glu Asn Leu Ala 
180 185 190 

Arg He Pro Leu A3n Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe Pro 
195 200 205 

Phe Gin He 
210 

(2) INFOR24ATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2807 base pairs 

(B) TYPE: nucleic acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



GGAAACCGAG GCGGGGAGAC CAGGAGGCCT TGGCCTCAGA GCTTCAGAGT CGCGTGGCAG 60 

CAAACAGAGA AACCTGTAGA GGGCAGTGTG CGTCACTTAG CTCAGGGAAG CTGCACGCGA 12 0 

AACTCACCCG CCTTCATTCA TAAACATCGT CAGCTAGGCA CCTACTCCTG GGCTTTCAGG 180 

ACAAACTGAA TCACGAAACC ACAGTGTCCT TAAAATAGGT CTGACCGCCT GAATCCCTGG 240 

CCAAGGTGTG TACGGGGCAT GGGAGCCCTT GTGCAGAGAT GCTTGCAGGA GCCTTGAGGG 3 00 

GCTCTGTAAG ACAGAGGCTA GGAAGACAAA GTTGGGGGCT ACAGCTTCTT GTCCTGCCCG 3€0 

GGGCCTCAGT TTCTTCGGTT GCCCACGTAG GAGTGCAGAG AGTCCAGCCC CTGGGGACCC 42 0 

AACCCAACCC CGCCCAGTTT CCGAGGAACT CGTCCGGGAG CGGGGGCGCC CCTCCCGCAC 480 

CGCCTTAGGC TTCCTTTGAA GCCTCTGCGG TCAGGCCACC GCTTCCTGGG AAGCCCAAGC 540 
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CAAGGCCAGG 


CCGAGTCGCC 


AACGGGAGGG 


GCCCGCGCGC 


GATTCTGGAG 


GAGGGCGGCG 


600 




GCCCCACAGG 


TCTCCAGGGC 


TGGCTAGCCG 


GGCTCCTAGA 


GCGOAGACTG 


CCAAGGCCTT 


660 




CGGGTCCTGG 


GCAGGAAGGA 


TCCTGGCAGG 


GAGGAGTTGC 


TTGGGGGGTG 


GGGGGGAAAG 


720 




GCTCCAGGCG 


CGGTGGAGCT 


CTGACCAGGA 


GAATGCACAC 


ACTCGGAGGG 


GAGGAGGCGX 


780 




GTCAGCCCCA 


AGCTAGCATC 


CCACCCGGGG 


AGCAGCGATG 


TGGGGCGAAG 


GTAGCCAGAG 


840 




CAAAAGAGCA 


GGCACCAGGT 


GACACGAAAC 


AGAAGATTCC 


GGGTAGAGCC 


AGAACCCCAG 


900 




AACPCC C ATT 


CAGGGAAGGT 


GCGAGGCGAG 


AACGAGTTAG 


GTGGACCCTC 


TCCAGGGGCA 


960 






TCTAAAGAGA 


ACCCGAAGGA 


CTTGCCGGAA 


AGAGAAACCG 


AAAGCGGCGG 


1020 




TGGGCGGGAT 


CGGTGGGCGG 


GGCCTCCCTG 


GTTTAAGAGC 


TTGATGCAGG 


GGCGGGCAGC 


1060 




AGCAGAGAGA 


ACTGCGGCCG 


TGGCAGCGOC 


ACGGCTCCCG 


GCCCCGGAGC 


ATGCGCGACA 


1140 




GCAGCCCCGG 


AACCCCCAGC 


CGCGGCGCCC 


CGCGTCCCGC 


CGCCAGGTGA 


GCCGAGGCAG 


1200 




CTGCG2VAGGA 


GCAGGCGGGA 


GGGGATGGGA 


GGAAGGGGAG 


CAGAGCCTGG 


CAGGACTATC 


1260 




CTCGCAGACT 


GCATGGCGGG 


GTCGTGGATG 


CTATGCCTCT 


GGCGCCCGCC 


CCACCGGCTG 


1320 




GCCCAGGCGG 


CCCCTCGCGC 


GCGCGGGGCG 


CCGTCAGCCC 


CTCCTCTCCG 


GCCCTGAGCC 


1380 




CGGATCGTCC 


GCCCGGGTTC 


CAGTTCCCGG 


CGTGGCCAGT 


AGGCGGCAAC 


CGCGAGGCGG 


1440 




CAAGCCACCC 


AGCGGGGACG 


GCCTGGAGTC 


GGGCCCCTCT 


CCACGCCCCC 


TTCTCCACGC 


1500 




GCGCGGGGAG 


GCAGGGCTCC 


ACCGCCAGTC 


TGGAAGGGTT 


CCACATACAG 


GAACGGCCTA 


1560 


!m 


CTTCGCAGAT 


GAGCCCACCG 


AGGCTCAGGC 


TCCGGGCGGA 


TTCTGCGTGT 


CACCCTCGCT 


1620 


CCTTGGGGTC 


CGCTGGCCGG 


CCTGTGCCAC 


CCGGACGCCC 


GGTTCACTGC 


CTCTGTCTCC 


16B0 




CCCATCAGCG 


CAGCCCCGGA 


CGCTATGGCC 


CACCCCTCCA 


GCTGGCCCCT 


CGAGTAGGAT 


1740 




GGTAGCACGT 


AACCAGGTGG 


AAGCCGACAA TGCGATCTCC CCGGCATCAG AGCCCCGACG^ 


1800 




GCGGCCAGAG 


CCATCCTCGT 


CCTCGTCTTC 


GTCCTCGCCG 


GCGGCCCCGG 


CGCGTCCCCG 


1860 




GCCCTGCCCG 


GTGGTCCCGG 


CCCCGGCTCC 


GGGCGACACT 


CACTTCCGCA 


CCTTCCGCTC 


1920 




CCACTCTGAT 


TACCGGCGCA 


TCACGCGGAC 


CAGCGCTCTC 


CTGGACGCCT 


GCGGCTTCTA 


1980 




CTGGGGACCC 


CTGAGCGTGC 


ATGGGGCGCA 


CGAACGGCTG 


CGTTCCGAAC 


CCGTGGGCAC 


2040 




CTTCTTGGTG 


CGCGACAGTC 


GCCAGCGGAA 


CTGCTTCTTC 


GCGCTCAGCG 


TGAAGATGGC 


2100 




TTCGGGCCCC 


ACGAGCATTC 


GTGTGCACTT 


CCAGGCCGGC 


CGCTTCCACC 


TGGACGGCAA 


2160 




CCGCGAGACC 


TTCGACTGCC 


TCTTCGAGCT 


GCTGGAGCAC 


TACGTGGCGG 


CGCCGCGCCG 


2220 




CATGTOGGGG 


GCCCCACTGC 


GCCAGCGCCG 


CGTGCGGCCG 


CTGCAGGAGC 


TGTGTCOCCA 


2280 




GCGCATCGTG 


GCCGCCGTGG 


GTCGCGAGAA 


CCTGGCACGC 


ATCCCTCTTA 


ACCCGGTACT 


2340 




CCGTGACTAC 


CTGAGTTCCT 


TCCCCTTCCA 


GATCTGACCG 


GCTGCCGCCG 


TGCCCGCAGA 


2400 




ATTAAGTGGG 


AGCGCCTTAT 


TATTTCTTAT 


TATTAATTAT 


TATTATTTTT 


CTGGAACCAC 


2460 




GTGGGAGCCC 


TCCCCGCCTA 


GGTCGGAGGG 


AGTGGGTGTG 


GAGGGTGAGA 


TCCCTCCCAC 


2520 




TTCTGGCTGG 


AGACCTTATC 


CCGCCTCTCG 


GGGGGCCTCC 


CCTCCTGGTG 


CTCCCTCCCG 


2580 




GTCCCCCTGG 


TTGTAGCAGC 


TTGTGTCTGG 


GGCCAGGACC 


TGAACTCCAC 


GCCTACCTCT 


2640 




CCATGTTTAC 


ATGTTCCCAG 


TATCTTTGCA 


CAAACCAGGG 


GTGGGGGAGG 


GTCTCTGGCT 


2700 




TCATTTTTCT 


GCTGTGCAGA 


ATATTCTATT 


TTATXTTTTT 


ACATCCAGTT 


TAGATAATAA 


2760 




ACTTTATTAT 


GAAAGTTTTT 


TTTTTTAAAG 


AAACAAAGAT 


TTCTAGA 




2807 



(2) INFORMATION FOR SEQ ID N0:12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acidS 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

Cii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Val Ala Arg Asn Gin Val Glu Ala Asp Asn Ala lie Ser Pro Ala 
15 10 15 

Ser Glu Pro Arg Arg Arg Pro Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Ser Pro Ala Ala Pro Ala Arg Pro Arg Pro Cys Pro Val Val Pro Ala 
35 40 45 

pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ser 
85 90 55 



-SitL Pro Val Gly Thr Phe Leu Val 
100 

Phe Phe Ala Leu Ser Val Lys Met 
115 120 

Val His Phe Gin Ala Gly Arg Phe 
130 135 

Phe Asp Cys Leu Phe Glu Leu Leu 
145 150 

Arg Met Leu Gly Ala Pro Leu Arg 
165 



Arg Asp ser Arg Gin Arg Asn Cys 
105 110 

Ala Ser Gly Pro Thr Ser lie Arg 
125 

His Leu Asp Gly Asn Arg Glu Thr 
140 

Glu His Tyr Val Ala Ala Pro Arg 
155 160 

Gin Arg Arg Val Arg Pro Leu Gin 
170 175 



Glu Leu Cys Arg Gin Arg lie Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 ISO 



- 120- 



Ala Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 



Pro Phe Gin lie 
210 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1611 base pairs 

(B) TYPE: aucleic acid 

(C) STRANDEDNESS : single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATTOE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2 63.. 152 9 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CGAATTCCGG GCGGGCTGTG TGAGTCTGTG AGTGGAAGGC GCGCCGGCTC TTTTGTCTGA 60 

GTGTGACCCG GTGGCTTTGT TCCAGGCATT CCGGTGATTT CCTCCGGGCA GTCCGCAGAA 120 

GCCGCAGCGG CCGCCCGCGC TCTCTCTGCA GTCTCCACAC CCGGGAGAGC CTGAGCCCGC 180 

GTCACGCCCC TCAGCCCCCG CTGAGTCCCT TCTCTGTTGT CGCGTCCGAA TCGAGTTCCC 240 

GGAATCAGAC GGTGCCCCAT AG ATG GCC AGC TTT CCC CCG AGG GTT AAC GAG 292 



Met Ala 



Ser Phe Pro Pro Arg Val Asn Glu 
5 10 



1 



AAA GAG ATC GTG AGA TCA CGT ACT ATA GGG GAA CTC TTG GCT CCA GCA 
Lys Glu lie Val Arg Ser Arg Thr lie Gly Glu Leu Leu Ala Pro Ala 
15 20 25 



340 



GCT CCT TTT GAC AAG AAA TGT GGT GGT GAG AAC TGG ACG GTT GCT TTT 
Ala Pro Phe Asp Lys Lys Cys Gly Gly Glu Aan Trp Thr Val Ala Phe 
30 35 40 



38B 
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GCT CCT GAT GGT TCC TAG TTT GCG TGG TCA CAA GGA TAT CGC ATA GTG 436 
Ala Pro Asp Gly Ser Tyr Phe Ala Trp Ser Gin Gly Tyr Arg lie Val 
45 50 55 

AAG CTT GTC GCG TGG TCC CAG TGC CGT AAG AAC TTT CTT TTG CAT GCT 484 
L.ys Leu Val Pro Trp Ser Gin Cys Arg Lys Asn Phe Leu Leu His Gly 
60 65 70 

TCC AAA AAT GTT ACC AAT TCA AGC TGT CTA AAA TTG GCA AGA CAA AAC 532 
Ser Lys Asn Val Thr Asn Ser Ser Cys Leu Lys Leu Ala Arg Gin Asn 
75 80 85 90 

AGT AAT GCT GGT CAG AAA AAC AAG CCT CCT GAG CAC GTT ATA GAC TGT 580 
Ser Asn Gly Gly Gin Lys Asn Lys Pro Pro Glu His Val lie Asp Cys 
95 100 105 

GGA GAC ATA GTC TOG AGT CTT GCT TTT GGG TCT TCA GTT CCA GAA PAA 628 
Gly Asp lie val Trp ser Leu Ala Phe Gly Ser Ser val Pro Glu Lys 
110 115 120 

CAG AGT CGT TGC GTT AAT ATA GAA TGG CAT CGG TTG CGA TTT GGA CAG 67 6 

Gin Ser Arg Cys Val Asn lie Glu Trp His Arg Phe Arg Phe Gly Gin 
125 130 135 

GAT CAG CTA CTC CTT GCC ACA GGA TTA AAC AAT GGT CGC ATC AAA ATC 724 
Asp Gin Leu Leu Leu Ala Thr Gly Leu Asn Asn Gly Arg He Lys He 
140 145 150 

TGG SAT- GTA TAT ACA GGA AAA CTC CTC CTT AAT TTG GTA GAC CAC ATT 772 
Trp Asp Val Tyr Thr Gly Lys Leu Leu Leu Asn Leu Val Asp His He 
155 160 165 170 

GAA ATG GTT AGA GAT TTA ACT TTT GCT CCA GAT GGG AGC TTA CTC CTT 82 0 

Glu Met Val Arg Asp Leu Thr Phe Ala Pro Asp Gly Ser Leu Leu Leu 
175 180 185 

GTA TCA GCT TCA AGA GAC AAA ACT CTA AGA GTG TGG GAC CTG AAA GAT 868 
Val Ser Ala Ser Arg Asp Lys Thr Leu Arg Val Trp Asp Leu Lys Asp 
190 195 200 

GAT GGA AAC ATG GTG AAA GTA TTG CGG GCA CAT CAG AAT TGG GTG TAC 9X6 
Asp Gly Asn Met Val Lys Val Leu Arg Ala His Gin Asn Trp Val Tyr 
205 210 215 

AGT TGT GCA TTC TCT CCC GAC TGT TCT ATG CTG TGT TCA GTG GGC GCC 964 
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Ser Cys Ala Phe Ser Pro Asp Cys Ser Met Leu Cys Ser Val Gly Ala 
220 225 230 

ACT AAA GCA GTT TTC CTT TGG AAT ATG GAT AAA TAC ACC ATG ATT AGG 1012 
Ser Lys Ala Val Phe Leu Trp Asn Met Asp Lys Tyr Thr Met He Arg 
235 240 245 250 

AAG CTG GAA GGT CAT CAC CAT GAT GTT GTA OCT TGT GAG TTT TCT CCT 1060 
Lys Leu Glu Gly His His His Asp Val Val Ala Cys Asp Phe Ser Pro 
255 260 265 

GAT GGA GCA TTG CTA GCT ACT GCA TCC TAT OAC ACT CGT GTG TAT GTC llOS 
Asp Gly Ala Leu Leu Ala Thr Ala Ser Tyr Asp Thr Arg Val Tyr Val 
270 275 260 

TGG GAT CCA CAC AAT GGA GAC CTT CTG ATG GAG TTT GGG CAC CTG TTT 1156 
Trp Asp Pro His Asn Gly Asp Leu Leu Met Glu Phe Gly His Leu Phe 
285 290 295 

CCC TCG CCC ACT CCA ATA TTT GCT GGA GGA GCA AAT GAC CGA TGG GTG 1204 
Pro Ser Pro Thr Pro lie Phe Ala Gly Gly Ala Asn Asp Arg Trp val 
300 305 310 

AGA GCT GTG TCT TTC AGT CAT GAT GGA CTG CAT GTT GCC AGO CTT GCT 1252 
Arg Ala Val Ser Phe Ser His Asp Gly Leu His Val Ala Ser Leu Ala 
315 320 325 330 

GAT GAT AAA ATG GTG AGG TTC TGG AGA ATC GAT GAG GAT TGT CCG GTA 1300 
Asp Asp- Lys Met Val Arg Phe Trp Arg lie Asp Glu Asp Cys Pro Val 
335 340 345 

CAA GTT GCA CCT TTG AGC AAT GGT CTT TGC TGT GCC TTT TCT ACT GAT 1348 
Gin Val Ala Pro Leu Ser Asn Gly Leu Cys Cys Ala Phe Ser Thr Asp 
350 355 360 

GGC AGT GTT TTA GCT GCT GGG ACA CAT GAT GGA AGT GTG TAT TTT TGG 1396 
Gly Ser Val Leu Ala Ala Gly Thr His Asp Gly Ser Val Tyr Phe Trp 
365 370 375 

GCC ACT CCA AGG CAA GTC CCT AGC CTT CAA CAT ATA TGT CGC ATG TCA 1444 
Ala Thr Pro Arg Glrx Val Pro Ser Leu Gin His lie Cys Arg Met Ser 
380 385 390 

ATC CGA AGA GTG ATG TCC ACC CAA GAA GTC CAA AAA CTG CCT GTT CCT 1492 
lie Arg Arg Val Met Ser Thr Gin Glu Val Gin Lys Leu Pro Val Pro 
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395 ^00 405 410 

TCC AAA ATA TTG CCG TTT CTC TCC TAC CGC GGT TAG A CTOAAGACTG 1539 
Ser Lys He Leu Ala Phe Leu Ser Tyr Arg Gly * 
415 420 

CCTTTCCTGG TAGGCCTGCC AGACAGAGCG CCCTTTACAA GACACACCTC AAGCTTTACC 1599 

«^ ISII 

TCGTGCCGAA tt 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 422 amino acids 

(B) TYPE: amino acid 
{ r> ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

Met Ala Ser Phe Pro Pro Arg Val Asn Glu Lys Glu He Val Arg Ser 
1 5 10 15 

Arg Thr He Gly Glu Leu Leu Ala Pro Ala Ala Pro Phe Asp Lys Lys 
20 25 30 

Cys <Siy Gly Glu Asn Trp Thr Val Ala Phe Ala Pro Asp Gly Ser Tyr 
35 40 45 

Phe Ala Trp Ser Gin Gly Tyr Arg He Val Lys Leu Val Pro Trp Ser 
50 55 60 

Gin Cys Arg Lys Asn Phe Leu Leu His Gly Ser Lys Asn Val Thr Asn 
65 70 75 60 

Ser Ser Cys Leu Lys Leu Ala Arg Gin Asn Ser Asn Gly Gly Gin Lys 
85 30 95 

Asn Lys Pro Pro Glu His Val He Asp Cys Gly Asp He Val Trp Ser 
100 105 110 

Leu Ala Phe Gly Ser Ser Val Pro Glu Lys Gin Ser Arg Cys Val Asn 
115 120 125 
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Ile Glu Trp His Arg Phe Arg Phe Gly Gin Asp Gin Leu Leu Leu Ala 
130 135 140 

Thr Gly Leu Asn Asn Gly Arg He Lys He Trp Asp Val Tyr Thr Gly 
145 150 155 1^0 

Lys Leu Leu Leu Asn Leu Vai Asp His He Glu Met Val Arg Asp Leu 
165 170 175 

Thr Phe Ala Pro Asp Gly Ser Leu Leu Leu Val Ser Ala Ser Arg Asp 
IBO 185 190 

Lys Thr Leu Arg Val Trp Asp Leu Lys Asp Asp Gly Asn Met Val Lys 
195 200 205 

Val Leu Arg Ala His Gin Asn Trp Val Tyr Ser Cys Ala Phe Ser Pro 
210 215 220 

Asp cys Ser Met Leu Cys Ser Val Gly Ala Ser Lys Ala Val Phe Leu 
225 230 235 240 

Trp Asn Met Asp Lys Tyr Thr Met He Arg Lys Leu Glu Gly His His 
245 250 255 

His Asp Val Val Ala Cys Asp Phe Ser Pro Asp Gly Ala Leu Leu Ala 
260 265 270 

Thr Ala Ser Tyr Asp Thr Arg Val Tyr Val Trp Asp Pro His Asn Gly 
275 280 285 

Asp Leu Leu Met Glu Phe Gly His Leu Phe Pro Ser Pro Thr Pro He 
290 295 300 

Phe Ala Gly Gly Ala Asn Asp Arg Trp Val Arg Ala Val Ser Phe Ser 
305 310 315 320 

His Asp Gly Leu His Val Ala Ser Leu Ala Asp Asp Lys Met Val Arg 
325 330 335 

Phe Trp Arg He Asp Glu Asp Cys Pro Val Gin val Ala Pro Leu Ser 
340 345 350 



Asn Gly Leu Cys cys Ala Phe Ser Thr Asp Gly Ser Val Leu Ala Ala 
355 360 365 
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Gly Thr His Asp Gly Ser Val Tyr Phe Trp Ala Thr Pro Arg Gin Val 
370 375 380 

Pro Ser Leu Gin His lie Cys Arg Met Ser lie Arg Arg Val Met 5er 
385 390 39S 400 

Thr Gin Glu Val Gin Lys Leu Pro Val Pro Ser Lys He Leu Ala Phe 
405 410 415 

Leu Ser Tyr Arg Gly * 
420 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHAJIACTERISTICS : 

(A) LENGTH: 7 83 base pairs 

(B) TYPE: nucleic acid 
iC) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

CTGTCTTCCT CCGCAGCGCG AGGCTGGGTA CAGGGTCTAT TGTCTGTGGT TGACTCCGTA 60 

CTXreGTCTG AGGCCTTCGG GAGCTTTCCC GAGCCAGTTA GCAGAAGCCG CAGCGACCGC 120 

CCCCGCCCGT CTCCTCTGTC CCTGGGCCCG GGAGACAAAC TTGGCGTCAC GCCCTCAGCG 180 

GTCGCCACTC TCTTCTCTGT TGTTGGGTCC GCATCGTATT CCCGGAATCA GACGGTGCCC 240 

CATAGATGGC CAGCTTTCCC CCGAGGGTCA ACGAGAAAGA GATCGTGAGA TCACGTACTA 300 

TAGGTGAACT TTTAGCTCCT GCAGCTCCTT TTGACAAGAA ATGTGGTCGT GAAAATTGGA 360 

CTGTTGCTTT TGCTCCAGAT GGTTCATACT TTGCTTGGTC ACAAGGACAT CGCACAGTAA 420 

AGCTTGTTCC GTGGTCCCAG TGCCTTCAGA ACTTTCTCTT GCATGGCACC AAGAATGTTA 430 

CCAATTCAAG CAGTTTAAGA TTGCCAAGAC AAAATAGTGA TGGTGGTCAG AAAAATAAGC 540 

CTCGTGACAT ATTATAGACT GTGGAGATAT AGTCTGGAGT CTTGCTTTTG GGTCATCAQT 600 
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TCCAGAAAAA CAGAGTCGCT GTGTAAATAT AGAATGGCAT CGCTTCAGAT TTGGACAAGA 660 

TCAGCTACTT CTTGCTACAG GGTTGAACAA TGGGCGTATC AAAATATGGG ATGTATATCA 720 

GGAAACTCCT CCTTAACTTG GTAGATCATA CTGAAGTGGT CAGAGATTTA ACTTTTGCTC 780 

CAG 783 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE; nucleic acid 
{C) STRANDEDWESS : single 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CTCTGTATGT CTGAATGAAG CTATAACATT TGCCTTTTTA TTGCAGGTTT TCCTTTGGAA 60 

TATGGATAAA TACACCATGA TACGGAAACT AGAAGOACAT CACCATGATG TGGTAGCTTG 120 

TGACTTTTCT CCTGATGGAG CATTACTGGC TACTGCATCT TATGATACTC GAGTATATAT 180 

CTGGe?m;CA CATAATGGAG ACATTCTGAT GGAATTTGGG CACCTGTTTC CCCCACCTAC ?40 

TCCAATATTT GCTGGAGGAG CAAATGACCG GTGGGTACGA TCTGTATCTT TTAGCCATGA 300 

TGGACTGCAT GTTGCAAGCC TTGCTGATGA TAAAATGGTG AGGTTCTGGA GAATTGATGA 360 

GGATTATCCA GTGCAAGTTG CACCTTTGAG CAATGGTCTT TGCTGTGCCT TCTCTACTGA 420 

TGGCAGTGTT TTAGCTGCTG GGACACATGA CGGAAGTGTG TATTTTTGGG CCACTCCACG 430 

GCAGGTCCCT AGCCTGCAAC ATTTATGTCG CATGTCAATC CGAAGAGTGA TGCCCACCCA 540 

AGAAGTTCAG GAGCTGCCGA TTCCTTCCAA OCTTTTGGAG TTTCTCTCGT ATCGTATTTA 600 

GAAGATTCTG CCTTCCCTAG TAGTAGGGAC TGACAGAATA CACTTAACAC AAACCTCAAG 660 

CTTTACTGAC TTCAATTATC TGTTTTTAAA GACGTAGAAG ATTTATTTAA TTTGATATGT 720 
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TCTTGTACTG CATTTTGATC AGTTGAGCTT TTAAAATATT ATTTATAGAC AATAGAAGTA 



TTTCTGAACA TATCAAATAT AAATTTTTTT 
GTACATATTT AGATATAAGC TGCTATATGT 
AGTTCTGACA TGTATATATT GCTTCAGTAG 



AAAGATCTAA CTGTGAAAAC ATACATACCT 
TGAATGGACC CTTTTGCXTT TCTGATTTTT 
AGCCACAATA TGTATCTTTG GTGTAAAGTG 



CAAGGAAATT TTAAATTCTG GGACACTGAG TTAGATGGTA AATACTGACT TACGAAAGTT 
GAATTGGGTG AGGCGGGCA^ ATCACCTGAG GTCAGCAGTT TGAGACTAGC CTGGCAAACA 
TGATGAAACC CTGTCTCTAC TAAAAATACA AAAAAAAAAA AA 



780 
840 
900 
960 
1020 
1080 
1122 



(2) INFORMATION FOR 5EQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 2537 jDase pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 422.. 2029 



(Xi) SEQUENCE DESCRIPTION; SEQ ID NO: 17: 
CGGCACGAGC CGGGCTCCGT CCGGAGGAAG CGAGGCTGCG CCGCCGGCCC GGCAGGAGCG 
GAGGACGGGA GCGCGGGCGG TCGCGCTCGC CCTGTCGCTG ACTGCGCTGC CCCGGCCCAT 
CCTTGCCTGG CCGCAGGTGC CCTGGATGAG GCCGCCGCGC GTGTCCCGGC CGCTGAGTGT 
CCCCCGCGGT CCCCCGGCGC CTGCCCTCAA GCGGCCGCCT CTCCTTGCCC GGGTCCCCGT 
TTTCCCCCGG CGCAGTCCTC CTCCGGTGGG CGCCTCCGCA CCTCGGCGCA GGCGGCACGG 
CCCTCGGGCC GGGATGGATC CGCCGGGAAG AGGAAGACAA GCCGGGGCGT TGAGCCCCTG 



60 
120 
180 
240 
300 
360 
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CGCACGGTGC CGCCGCGCGT AGTGGGAGCT TACTCGCAGT AGGCTCTCGC TCTTCTAATC 420 

A ATG GAT AAA GTG GGG AAA ATG TGG AAC AAC TTA AAA TAC AGA TGC 466 
Met Asp Lys Val Gly Lys Met Trp Asm Asn Leu Lys Tyr Arg Cys 
15 10 15 

CAG AAT CTC TTC AGO CAC GAG GGA GGA AGC CGT AAT GAG AAC GTG GAG 514 
Gin Asn Leu Phe Ser His Glu Gly Gly Ser Arg Asn Glu Asn Val Glu 
20 25 30 

ATG AAC CCC AAC AGA TGT CCG TCT GTC AAA GAG AAA AGC ATC AGT GTG 5 62 

Met Asn Pro Asn Arg Cys Pro Ser val Lys Glu Lys Ser lie Ser Leu 
35 40 45 

GGA GAG GCA GCT CCC CAG CAA GAG AGC AGT CCC TTA AGA GAA AAT GTT ^10 
Gly Glu Ala Ala Pro Gin Gin Glu Ser Ser Pro Leu Arg Glu Asn Val 
50 55 60 

GCC TTA CAG CTG GGA CTG AGC CCT TCC AAG ACC TTT TCC AGG CGG AAC 658 
Ala Leu Gin Leu Gly Leu Ser Pro Ser Lys Thr Phe Ser Arg Arg Asn 
65 70 75 

CAA AAC TGT GCC GCA GAG ATC CCT CAA GTG GTT GAA ATC AGC ATC GAG 706 
Gin Asn Cys Ala Ala Glu Tie Pro Gin Val Val Glu lie Ser lie Glu 
80 85 90 95 

AAA GAC AGT GAC TCG GGT GCC ACC CCA GGA ACG AGG CTT GCA CGG AGA 754 
Lys Asp Ser Asp Ser Gly Ala Thr Pro Gly Thr Arg Leu Ala Arg Arg 
100 105 110 

GAC TCC TAC TCG CGG CAC GCC CCG TGG GGA GGA AAG AAG AAA CAT TCC 802 
Asp Ser Tyr Ser Arg Hia Ala Pro Trp Gly Gly Lys Lys Lys His Ser 
115 120 125 

TGT TCC ACA AAG ACC CAG AGT TCA TTG GAT ACC GAG AAA AAG TTT GGT 850 
Cys Ser Thr Lys Thr Gin Ser Ser Leu Asp Thr Glu Lys Lys Phe Gly 
130 135 140 

AGA ACT CGA AGC GGC CTT CAG AGG CGA GAG CGG CGC TAT GGA GTC AGC 898 
Arg Thr Arg Ser Gly Leu Gin Arg Arg Glu Arg Arg Tyr Gly Val Ser 
145 150 155 

TCC ATG CAG GAC ATG GAC AGC GTT TCT AGC CGC GCG GTC GGG AGC CGC 946 
Ser Met Gin Asp Met Asp Ser Val Ser Ser Arg Ala Val Gly Ser Arg 
160 165 170 175 
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TCC CTG AGG CAG AGG CTC CAG GAC ACG GTG GGT TTG TGT TTT CCC ATG 
Ser Lgu Arg Gin Arg Leu Gin Asp Thr Val Gly Leu Cys Phe Pro Met 
180 IBS 190 



994 



AGA ACT TAG AOC AAG CAG TCA AAG CCA CTC TTT TCC AAT AAA AGA AAA 
Arg Thr Tyr Ser I*ys Gin Ser Lys Pro Leu Phe Ser Asn Lys Arg Lys 
195 200 205 



1042 



ATA CAT CTT TCT GAA TTA ATG CTG GAG AAA TGC CCT TTT CCV OCT GGC 
lie His Leu Ser Glu Leu Met Leu Glu Lys Cys Pro Phe Pro Ala Gly 
210 215 220 



1090 



TCG GAT TTA GCA CAA AAG TGG CAT TTG ATT AAA CAG CAT ACC GCC CCT 
Ser Asp Leu Ala Gin Lys Trp His Leu lie Lys Gin His Tlir Ala Pro 
225 230 235 



1138 



GTG AGC CCA CAC TCA ACA TTT TTT GAT ACA TTT GAT CCA TCA CTG GTG 
Val Ser Pro His Ser Thr Phe Phe Asp Thr Phe Asp Pro Ser Leu Val 
240 245 250 255 



1186 



TCT ACA GAA GAT GAA GAA GAT AGG CTT CGC GAG AGA AGA CGG CTT AGT 
Ser Thr Glu Asp Glu Glu Asp Arg Leu Arg Glu Arg Arg Arg Leu Ser 
260 265 270 



1234 



ATC GAA CAA GGG GTG CAT CCC CCT CCC AAC GCA CAA ATA CAC ACC TTT 
lie Glu Glu Gly Val Asp Pro Pro Pro Asn Ala Gin He His Thr Phe 
275 280 285 



1282 



GAA eer ACT GCA CAG GTC AAC CCA TTG tat AAG CTG GGA CCA AAG TTA 
Glu Ala Thx Ala Gin Val Asn Pro Leu Tyr Lys Leu Gly Pro Lys Leu 
290 295 300 



1330 



GCT CCT CGG ATG ACA GAG ATA AGT GGA GAT GGT TCT GCA ATT CCA CAA 
Ala Pro Gly Met Thr Glu He Ser Gly Asp Gly Ser Ala lie Pro Gin 
305 310 315 



1378 



GCA ATT GTG ACT CAG AAG AGG ATT CAA CCA CCC TAT GTC TGC AGT CAC 
Ala He Val Thr Gin Lys Arg He Gin Pro Pro Tyr Val Cys Ser His 
320 325 330 335 



1426 



GGA GGC AGA AGC AGC GCC AGG TGT CCG GGG ACA GCC ACG CGC ACG TTA 
Gly Gly Arg Ser Ser Ala Arg Cys Pro Gly Thr Ala Thr Arg Thr Leu 
340 345 350 



1474 



GCA GAC AGG GAG CTT GGA AAG TTC ATA CGC AGA TCG ATT ACA TAG ACT 



1522 
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Ala Asp Arg Glu Leu Gly Lys Phe He Arg Arg Ser He Thr Tyr Thr 
355 360 365 

GCC TCO TGC CAG ATT TGC TTC AGA TCA CAG GGA ATC CCT GTT ACT GGG 1570 
Ala Ser Cys Gin lie Cys Phe Arg Ser Gin Gly He Pro Val Thr Gly 
370 375 380 

GCG TGA TOG ACC GAT ACG AGG CCG AAG CCC TTC TAG AAG GGA AAC CGG 1618 
Ala * Trp Thx Asp Thr Arg Pro Lys Pro Phe * Lys Gly Asn Arg 
385 390 395 

AAG GCA CGT TCT TGC TCA GGG ACT CTG CAC AGG AGG ACT ACC TCT TCT 1666 
Lys Ala Arg Ser Cys Ser Gly Thr Leu His Arg Arg Thr Thr Ser Ser 
400 405 410 415 

CTG TGA GCT TCC GCC GCT ACA ACA GGT CTC TGC ACG CCC GGA TCG AGC 1714 
Leu * Ala Ser Ala Ala Thr Thr Gly Leu Cys Thr Pro Gly Ser Ser 
420 425 430 

AGT GGA ACC ACA ACT TCA GCT TCG ATG CCC ATG ACC CCT GCG TGT TTC 17 62 

ser Gly Thr Thr Thr Ser Ala Ser Met Pro Met Thr Pro Ala Cys Phe 
435 440 445 

ACT CCT CCA CGT CAC GGG GCT TCT CGA ACA CTA TAA AGA CCC CAG CTC 1810 
Thr Fro Pro Arg His Gly Ala Ser Arg Thr Leu * Arg Pro Gin Leu 
450 455 460 

TTG CAT GTT TTT TGA ACC GTT GCT AAC GAT ATC ACT GAA TAG AAC TTT 1358 
Leu His-Val Phe * Thr Val Ala Asn Asp He Thr Glu * Asn Phe 
465 470 475 

CCC TTT CAG CCT GCA GTA TAT CTG CCG CGC AGT GAT CTG CAG ATG CAC 1906 
Pro Phe Gin Pro Ala Val Tyr Leu Pro Arg Ser Asp Leu Gin Met His 
480 465 490 495 

TAC GTA TGA TGG GAT TGA CGG GCT CCC GCT ACC GTC GAT GTT ACA GGA 1954 
Tyr Val * Trp Asp * Arg Ala Pro Ala Thr Val Asp Val Thr Gly 
500 505 510 



qi^T TTT AAA AGA GTA TCA TTA TAA ACA AAA AGT TAG GGT TCG CTG GTT 

Phe Phe Lys Arg val Ser Leu * Thr Lys Ser * Gly Ser Leu Val 
515 520 525 



2002 



AGA ACG 
Arg Thr 



AGA CCA GTC AAA GCA AAG TAACTCCTGT CCCCAAAGGG CACTAACTAA 
Arg Pro Val Lys Ala Lys 



2056 
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530 535 

QTCTGCTCCT CCCGTGCATC GAACTGCACC CATAGGAGGC AGTCAGCTGC TAGGATTTCC 2116 

CACCCAGAAT GGGAGCTTAG TCATTAGCCT CTGCCCTATG GGGTCCGCTG TTCCTCAGAC 2176 

AAftGGTGCCT AGQGACAGCA AGATGGCTTG CAGGTGTTCG GTGGGCTGTG ACAACTGAGG 2236 

GAGGCAACTC TGOGGCATTT GCTATGAAGA ATTCTATTTC TTACCGAAGA ACAAATTATT 2296 

AATATTGGAT GGGTATTTCA ATAGTGTGAC TAATGTTTCA AATTATOTTT TCTAAGAATT 2356 

TTTCTATAAC CTTCAQAAAA AGTAGTGATG TTTGTAGTTA CTATAAATCA AGCTTTGAAA 2416 

GTTCAAAACA AACAAGTTAA ATAAAAGACT ACCTTCCTTT TAGAGAAAAC AAATGCAAGT 2476 

TTTCCCAGCC ACAQGCATTG TGCACTGTTA ATGTTGCTTO TTATCAGCTC CTTTCTCCTC 2 53 5 

C 2537 



(2) XNFORMATION POR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

tA) LENGTH: 535 aroino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 18: 

Met Asp Lys Val Gly Lys Met Trp Asn Asrt Leu Lys Tyr Arg Cys Gin 
1 5 10 15 

Asn Leu Phe Sex His Glu Gly Gly Ser Arg Asn Glu Asn Val Glu Met 
20 25 30 

Asn Pro Asn Arg Cys Pro Ser val Lys Glu Lys Ser lie Ser Leu Gly 
35 40 45 

Glu Ala Ala Pro Gin Gin Gla Ser Ser Pro Leu Argr Glu Asn Val Ala 
50 55 60 

Leu Gin Leu Gly Leu Ser Pro Ser Lys Thr Phe Ser Arg Arg Asn Gin 
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65 70 75 80 

Asn Cys Ala Ala Glu He Pro Gin Val Val Glu lie Ser He Glu Lye 
85 90 95 

Asp Ser Asp Ser Gly Ala Thr Pro Gly Thr Arg Leu Ala Arg Arg Asp 
100 105 110 

Ser Tyr Ser Arg His Ala Pro Trp Gly Gly Lys Lys Lys His Ser Cys 
115 120 125 

Ser Thr Lys Thr Gin Ser Ser Leu Asp Thr Glu Lys Lys Phe Gly Arg 
130 135 140 

Thr Arg Ser Gly Leu Gin Arg Arg Glu Arg Arg Tyr Gly Val Ser Ser 
145 150 155 160 

Met Gin Asp Met Asp Ser Val Ser Ser Arg Ala Val Gly Ser Arg Ser 
155 170 175 

Leu Arg Gin Arg Leu Gin Asp Thr Val Gly Leu Cys Phe Pro Met Arg 
leO 185 190 

Thr Tyr Ser Lys Gin Ser Lys Pro Leu Phe Ser Asn Lys Arg Lys He 
195 200 205 

His Leu Ser Glu Leu Met Leu Glu Lys Cys Pro Phe Pro Ala Gly Ser 
210 215 220 

Asp Leu Ala Gin Lys Trp His Leu He Lys Gin His Thr Ala Pro Val 
225 230 235 240 

Ser Pro His Ser Thr Phe Phe Asp Thr Phe Asp Pro Ser Leu Val Ser 
245 250 255 

Thr Glu Asp Glu Glu Asp Arg Leu Arg Glu Arg Arg Arg Leu Ser He 
260 265 270 

Glu Glu Gly Val Asp Pro Pro Pro Asn Ala Gin He His Thr Phe Glu 
275 280 285 

Ala Thr Ala Gin Val Asn Pro Leu Tyr Lys Leu Gly Pro Lys Leu Ala 
290 2S5 300 



Pro Gly Met Thr Glu xle Ser Gly Asp Gly Ser Ala lie Pro Gin Ala 
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sos 310 315 320 

lie Val Thr Gin Lys Arg lie Gin Pro Pro Tyr Val Cys Ser His Gly 
325 330 335 

Gly Arg Ser Ser Ala Arg Cys Pro Gly Thr Ala Thr Arg Tlir Leu Ala 
340 345 350 

Asp Arg Glu Leu Gly Lys Phe He Arg Arg Ser lie Thr Tyr Thr Ala 
355 360 365 

Ser Cys Gin He Cys Phe Arg Ser Gin Gly He Pro Val Thr, Gly Ala 
370 375 300 

* Trp Thr Asp Thr Arg Pro Lys Pro Phe * Lys Gly Asn Arg Lys 
385 390 395 400 

Ala Arg Ser Cys Ser Gly Thr Leu His Arg Arg Thr Thr Ser Ser Leu 
405 410 415 

* Ala Ser Ala Ala Thr Thr Gly Leu Cys Thr Pro Gly Ser Ser Ser 

420 425 430 

Gly Thr Thr Thr Ser Ala Ser Met Pro Met Thr Pro Ala Cys Phe Thr 
435 440 445 

Pro Pro Arg His Gly Ala Ser Arg Thr Leu * Arg Pro Gin Leu Leu 
450 455 460 

His Val Phe * Thr Val Ala Asn Asp He Thr Glu * Asn Phe Pro 
465 470 475 480 

Phe Gin Pro Ala Val Tyr Leu Pro Arg Ser Asp Leu Gin Met His Tyr 
485 490 495 

Val * Trp Asp * Arg Ala Pro Ala Thr Val Asp Val Thr Gly Phe 
500 505 510 

Phe Lys Arg Val Ser Leu ^ Thr Lys Ser * Gly Ser Leu Val Arg 
515 520 525 



Thr Arg Pro Val Lys Ala Lys 
530 535 
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(2) INFOHKATION FOR SEQ ID N0:19: 

(i) SEQUENCE CHABACTERISTICS : 

(A) LENGTH: 12 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

iii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 19 : 

GATTAAACAG CATACAGCTC CTGTGAGCCC ACATTCAACA TTTTTTGATA CTTTGATCCA 60 

TCTTTGGTTT CTACAGAAGA TGAAGAAGAT AGGCTTAGAG AGAGAAGGCG GCTTAGTATT 120 

GAAGAAGGGG TTGATCCCCC TCCCAATGCA CAAATACATA CATTTGAAGC TACTGCACAG 180 

GTTAATCCAT TATTAAACTG GGACCAAAAT TAGCTCCTGG AATGACTGAA ATAAGTGGGG 240 

ACAGTTCTGC AATTCCACAA GCTAATTGTG ACTCGGAAGA GGATACAACC ACCCTGTGTT 300 

GCAGTCACGG AGGCAGAAGC AGCGTCAGAT ATCTGGAGAC AGCCATACCC ATGTTAGCAG 360 

ACAGGGAGCT TGGAAAOTCC ACACACAGAT TGATTACATA CACTGCTTCG TGCCTGATTT 420 

GCTTeftAATT ACAGGGAATC CCTGTTACTG GGGAGTGATG GACCGTTATG AAGCAGAAGC 480 

CCTTCTCGAA GGGAAACCTG AAGGCACGTT TTTGCTCAGG GACTCTGCGC AAGAGGACTA 540 

CTTCTTCTCT GTGAGCTTCC GCCGATACAA CAGATCCCTG CATGCCCGAA TTGAGCAGTG 600 

GAATCACAAC TTTAGTTTCG ACGCCCATOA CCCGTGTGTA TTTCACTCCT CCACTGTAAC 660 

GGGACTTTTA GAACATTATA AAGATCCCAG TTCGTGCATG TTTTTTGAAC CATTGCTTAC 72 0 

TATATCACTA AATAGGACTT TCCCTTTTAG CCTGCAGTAT ATCTGTCGCG CGGTAATCTG 780 

CAGGTGCACT ACGTATGATG GAATTGATGG GCTCCCTCTA CCCTCAATGT TACAGGATTT 840 

TTTAAAAGAG TATCATTATA AACAAAAAGT TAGAGTTCGC TGGTTGGAAC GAGAACCAGT 900 

CAAGGCAAAG TAAACTCTCC GGTCCCCAAA GGGTGTTAAC TAGGTCCGCT TTCATGTGCA 960 
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TCAGACAGTA CACCTATAGC AAGCACACGT AGCAGTGTTA GGCTTTTTCA TACAGTATGT 1020 

AAGCTTAGTG TTAGTATCTG TCAGATGCTA CCTGCTGTTA CTTATTCAGA TAAACATGGT 1080 

GCCTATTGGA ACAATAGCGG ATAGAGCTAC AGGTGTTCAG TAAGACTACA AAAACATTTT 1140 

GCCTATTTCG CTAACAGTTT GGTTTTTAAT GGCTGTGGTA TTTGAGTGAG GCAACTCTGG 120 0 

GGCATTTOTT ATGAAGAAAT G 1221 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 69 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s ingl e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 116-,1330 



T^D SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



GGCACGAGGC GGTGGTGGCG GCGGCGGGCG CGGCCGCGGC GGGGCGGGCG CGGAATGAAG 60 

GCCCACGGCC CTGGGGGCTG AGGCGCCCGC CGCCTGGGGC GGGCCGCGCG TCCTC ATG 118 

Mec 
1 

GAG GCC GGA GAG GAG CCG CTG CTG CTG GCT GAA CTC AAG CCT GGG CGC 166 
Glu Ala Gly Glu Glu Pro Leu Leu Leu Ala Glu Leu Lys Pro Gly Arg 
5 10 15 

CCC CAC CAG TTC GAC TGG AAG TCA AGC TGC GAG ACC TGG AGC GTG GCC 214 
Pro His Gin Phe Asp Trp Lys Ser Ser Cys Glu Thr Trp Ser Val Ala 
20 25 30 

TTC TGG CCA GAC GOT TCC TGG TTC GCC TGG TCT CAA GGA CAC TGC GTG 262 
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Phe Ser Pro Asp Gly Ser Trp Phe Ala Trp Ser Gin Gly His Cys Val 
35 40 45 

GTC AAG CTG GTC CCC TGG CCC TTA GAG GAA CAG TTC ATC CCT AAA GGA 310 
Val Lys Leu Val Pro Trp Pro Leu Glu Glu Gin Phe lie Pro Lys Gly 
50 55 60 65 

TTC GAA GCC AAG AGC CGA AGC AGC AAG AAT GAC CCA AAA GGA CGG GGC 358 
Phe Glu Ala Lys Ser Arg Ser Ser Lys Asn Asp Pro Lys Gly Arg Gly 
70 75 80 

AGT CTG AAG GAG AAG ACG CTG GAC TGT GGC CAG ATT CTG TGG GGQ CTG 4 06 

Ser Leu Lys Glu Lys Thr Leu Asp Cys Gly Gin lie Val Trp Gly Leu 
85 90 95 

GCC TTC AGC CCG TGG CCC TCT CCA CCC AGC AGG AAA CTC TGG GCA CGT 454 
Ala Phe Ser Pro Trp Pro Ser Pro Pro Ser Arg Lys Leu Trp Ala Arg 
100 105 110 

CAC CAT CCC CAG GCG CCT GAT GTT TCT TGC CTG ATC CTG GCC ACA GGT 502 
His His Pro Gin Ala Pro Asp Val Ser Cys Leu Tie Leu Ala Thr Gly 
115 120 125 

CTC AAC GAT GGG CAG ATC AAG ATT TGG GAG GTA CAG ACA GGC CTC CTG 550 
Leu Asn Asp Gly Gin lie Lys lie Trp Glu Val Gin Thr Gly Leu Leu 
130 135 140 145 

CTX CTG AAT CTT TCT GGC CAC CAA GAC GTC GTG AGA GAT CTG AGC TTC 598 
Leu EiHtr Asn Leu Ser Gly His Gin Asp Va.l Val Arg Asp Leu Ser Phe 
150 155 160 

ACG CCC AGC GGC AGT TTG ATT TTG GTC TCT GCA TCC CGG GAT AAG ACA 646 
Thr Pro Ser Gly Ser Leu He Leu Val Ser Ala Ser Arg Asp Lys Thr 
165 170 175 

CTT CGA ATT TGG GAC CTG AAT AAA CAC GGT AAG CAG ATC CAG GTG TTA 694 
Leu Arg He Trp Asp Leu Asn Lys His Gly Lys Gin He Gin Val Leu 
ISO 185 130 

TCC GGC CAT CTG CAG TGC GTT TAC TGC TGC TCC ATC TCC CCT GAC TGT 742 
Ser Gly His Leu Gin Trp Val Tyr Cys Cys Ser He Ser Pro Asp Cys 
195 200 205 

AGC ATG CTG TGC TCT GCA GCT GGG GAG AAG TCG GTC TTT CTG TGG AGC 790 
Ser Met Leu Cys Ser Ala Ala Gly Glu Lys Ser Val Phe Leu Trp Ser 
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210 215 



220 225 



ATG CGG TCC TAG ACA CTA ATC CGG AAA CTA GAA GGC CAC CAA AGC AGT 838 
Mec Arg Ser Tyr Thr Leu lie Arg Lys Leu Glu Gly His Gin Ser Ser 
230 235 240 

GTT GTC TCC TGT GAT TTC TOT OCT GAT TCA GCC TTG CTT GTC ACA GCT 886 
Val Val Ser Cys Asp Phe Ser Pro Asp Ser Ala Leu Leu VaX Thr Ala 
245 250 255 

TCG TAT GAC ACC AGT GTG ATT ATG TGG GAC CCC TAC ACC GGC GCG AGG 934 
Ser Tyr Asp Thr Ser Val lie Met Trp Asp Pro Tyr Thr Gly Ala Arg 
260 265 270 

CTG AGG TCA CTT CAT CAC ACA CAA CTT GAA CCC ACC ATG GAT GAC AGT 982 
Leu Arg Ser Leu His His Thr Gin Leu Glu Pro Thr Met Asp Asp Ser 
275 280 285 

GAC GTC CAC ATG AGC TCC CTG AGG TCC GTG TGC TTC TCA CCT GAA GGC 103 0 

Asp Val His Met Ser Ser Leu Arg Ser Val Cys Phe Ser Pro Glu Gly 
290 295 300 305 

TTG TAT CTC GCT ACG GTG GCA GAT GAC AGG CTG CTC AGG ATC TGG GCT 1078 
Leu Tyr Leu Ala Thr Val Ala Asp Asp Arg Leu Leu Arg He Trp Ala 
310 315 320 

CTG GAA CTG AAG GCT CCG GTT GCC TTT GCT CCG ATG ACC AAT GGT CTT 1126 
Leu Glu Leu Lys Ala Pro Val Ala Phe Ala Pro Met Thr Asn Gly Leu 
325 330 335 

TGC TGC ACG TTC TTC CCA CAC GGT GGA ATT ATT GCC ACA GGG ACG AGA 1174 
Cys Cys Thr Phe Phe Pro His Gly Gly He He Ala Thr Gly Thr Arg 
340 345 350 

GAT GGC CAT GTC CAG TTC TGG ACA GCT CCC CGG GTC CTG TCC TCA CTG 1222 
Asp Gly His Val Gin Phe Trp Thr Ala Pro Arg Val Leu Ser Ser Leu 
355 360 365 

AAG CAC TTA TGC AGG AAA GCC CTC CGA AGT TTC CTG ACA ACG TAT CAA 1270 
Lys His Leu Cys Arg Lys Ala Leu Arg Ser Phe Leu Thr Thr Tyr Gin 
370 375 380 385 

GTC CTA GCA CTG CCA ATC CCC AAG AAG ATG AAA GAG TTC CTC ACA TAC 1318 
val Leu Ala Leu Pro lie Pro Lys Lys Mec Lys Glu Phe Leu Thr Tyr 
390 395 400 
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AGG ACT TTC TAGCAGTGCC GGCTCCCCCA CCTCCTGCAG CAGCAGCAGT 1367 
Arg Thr Phe 

405 

ACAAGGGACT GGCTAGGATG GAGTCAGGCA GCTCACACTG GACCAGTGTG GACCTTCCTT 1427 

CCTCCCATGO CATGTGCAAG TAGGTCTGCG TGACCCCACT TCTQTGGTGC CGGCCTTACC 1487 

TCOTCTTCAT CCGTGGTGAG CAGCCTCCGT CAGTCTAGTT GTGTTGAAGC CAAGTGCAGT 1547 

TGTGGATGTT GCTGGGGTAA TAAAGGCAAG CGGGCTCCAG AGCCTCTCTG GTGGCGGCCA 1607 

AGCCACACTC CCTTAACTGG GAAGTACCTG CCACGTAGGG CATTTCTGCT GCCTATTTCC 1667 

AGCCAGCGGC TGCATGGTTT GAAGTTCCTC CGTTGTGGTC AGAAGAACTC TGGTGTTTGG 1727 

TTCCCTGCTC AGCTGCGCGT GGACTGGGCT GAGCTCCTCA CCATACACTA GTGCCGGCTT 1787 " 

TTGTTTCCTG TAAACAGTGG TTGCATGTGT AGAGAAGTAA CAAGCGAGTA TTCAGATCAT 1847 

ACGAGGAGGC GTTCCTCGGT GCATGACGGO? CAGATGGCCA TTTATCAGCA TATTTATTTG 1907 

TATTTTCTCA GCACATAGTA AGGTACAACT GTGTTTTCTC AATTGTCTCG AAAAAACAGA 1967 

GTTCTTAAGT GGCCC AGTTG TGCAGCCAAG TCTAAGTCGT GTGGAGTCAG TGCTGACATC 2 027 

ACTGGCTTGT GC^rGTCTGTC ACATGTGTTT GTCTCTGCTG CTTGACCTCA TGGGATGTAC 2 087 

CCTCeteSTTC AACTCCCCAA AACAGACAGC CCCTTCCAAG CACCGTTCTT TGACAGCGGT 2147 

AGCAGCTACC TATTCAAGAC GCCTCACACA AAATCTGCCT TAGAAAGTTA ATATATTTTA 2207 

AATTATTTTA AAAGAAACTC AACATCTTAT TCTTTGGCCT TTCTTAATTG ATGCTTTATG 2267 

GAGGCAGTGT TAACATTrGTA CAGTGTATGC ATAGAGGAGT CTCCTCTA7T TGAAGAACAA 2327 

TGCAAAATGA GGCTTTCATT GAAGGGAAAA AAAAAAAAAA AA 2369 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 404 amino acids 

(B) TYPE: aiftino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Mec Glu Ala Gly Glu Glu Pro Leu Leu Leu Ala Glu Leu Lys Pro Gly 
15 10 15 

Arg Pro His Gin Phe Asp Trp Lys Ser Ser Cys Glu Thr Trp Ser Val 
20 25 30 

Ala Phe Ser Pro Asp Gly Ser Trp Phe Ala Trp Ser Gin Gly His Cys 
35 40 45 

Val Val Lys Leu Val Pro Trp Pro Leu Glu Glu Gin Phe lie Pro Lys 
50 55 60 

Gly Phe Glu Ala Lys Ser Arg Ser Ser Lys Asn Asp Pro Lys Gly Arg 
65 70 75 80 

Gly Ser Leu Lys Glu Lys Thr Leu Asp Cys Gly Gin lie val Trp Gly 
85 90 95 

Leu Ala Phe Ser Pro Trp Pro Ser Pro Pro Ser Arg Lys Leu Trp Ala 
100 105 110 

Arg His His Pro Gin Ala Pro Asp Val Ser Cys Leu He Leu Ala Thr 
115 120 125 

Gly tOT" Asn Asp Gly Gin He Lys He Trp Glu Val Gin Thr Gly Leu 
130 135 140 

Leu Leu Leu Asn Leu Ser Gly His Gin Asp Val Val Arg Asp Leu Ser 
145 ISO 155 160 

Phe Thr Pro Ser Gly Ser Leu He Leu Val Ser Ala Ser Arg Asp Lys 
165 170 175 

Thr Leu Arg He Trp Asp Leu Asn Lys His Gly Lys Gin He Gin Val 
180 185 190 

Leu Ser Gly His Leu Gin Trp Val Tyr Cys Cys Ser He Ser Pro Asp 
195 200 205 



Cys ser Met Leu Cys Ser Ala Ala Gly Glu Lys Ser Val Phe Leu Trp 
210 215 220 
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Ser Met Arg Ser Tyr Thr Leu He Arg Lys Leu Glu Qly His Gin Ser 
225 230 235 240 

Ser Val Val Ser Cys Asp Phe Ser Pro Asp Ser Ala Leu Leu Val Thr 
245 250 255 

Ala Ser Tyr Asp Thr Ser Val He Mac Trp Asp Pro Tyr Thr Gly Ala 
260 265 270 

Arg Leu Arg Ser Leu His His Thr Gin Leu Glu Pro Thr Het Asp Asp 
275 280 285 

Ser Asp Val His Met Ser Ser Leu Arg Ser Val Cys Phe Ser Pro Glu 
290 295 300 

Gly Leu Tyr Leu Ala Thr Val Ala Asp Asp Arg Leu Leu Arg He Trp 
305 310 315 320 

Ala Leu Glu Leu Lys Ala Pro Val Ala Phe Ala Pro Met Thr Asn Gly 
325 330 335 

Leu Cys Cys Thr Phe Phe Pro His Gly Gly He He Ala Thr Gly Thr 
340 345 350 

Arg Asp Gly His Val Gin Phe Trp Thr Ala Pro Arg Val L«u Ser Ser 
355 360 365 

Leu Lys His Leu Cys Arg Lys Ala Leu Arg Ser Phe Leu Thr Thr Tyr 
T7Cr 375 380 

Gin Val Leu Ala Leu Pro He Pro Lys Lys Met Lys Glu Phe Leu Thr 
385 390 395 400 

Tyr Arg Thr Phe 



(2) XMFORMATIOM FOR SEQ ID NO: 22: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1246 base pairs 

(B) TYPE: nucleic acia 

(C) STRAKDEDNESS; single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO--22-. 

GACACTGCAT CGTCAAACTG ATCCCCTGGC CGTTGGAGGA GCAGTTCATC CCTAAAGGGT 60 

TTGAAGCCAA AAGCCGAAGT AGCAAAAATG AGACGAAAGG GCGGGGCAGC CC AAAAGAGA 120 

AGACGCTGGA CTGTGGTCAG ATTGTCTGGG GGCTGGCCTT CAGCCTGTGC TTTCCCCACC 180 

CAGCAGGAAG CTCTGGGCAC GCCACCACCC CCAAGTGCCC GATGTCTCTT GCCTGGTTCT 240 

XGCTACGGGA CTCAACGATG GGCAGATCAA GATCTGGGAG GTGCAGACAG GGCTCCTGCT 3 00 

TTTGAATCTT TCCGGCCACC AAGATGTCGT GAGAGATCTG AGCTTCACAC CCAGTGGCAG 3 SO 

TTTGATTTTG GTCTCCGCGT CACGGGATAA GACTCTTCGC ATCTGGGACC TGAATAAACA 42 0 

CGGTAAACAG ATTCAAGTGT TATCGGGCCA CCTGCAGTGG GTTTACTGCT GTTCCATCTC 480 

CCCAGACTGC AGCATGCTQT GCTCTGCAGC TGGAGAGAAG TCGGTCTTTC TATGGAGCAT 540 

GAGGTCCTAC ACGTTAATTC GGAAGCTAGA GGGCCATCAA AGCAGTGTTG TCTCTTGTGA 600 

CTTCTCCCCC GACTCTGCCC TGCTTGTCAC GGCTTCTTAC GATACCAATG TGATTATGTG 660 

GGACCCerAC ACCGGCGAAA GGCTGAGGTC ACTCCACCAC ACCCAGGTTG ACCCCGCCAT 720 

GGATGACAGT GACGTCCACA TTAGCTCACT GAGATCTGTG TGCTTCTCTC CAGAAGGCTT 7S0 

GTACCTTGCC ACGGTGGCAG ATGACAGACT CCTCAGGATC TGGGCCCTGG AACTGAAAAC 840 

TCCCATTGCA TTTGCTCCTA TGACCAATGG GCTTTGCTGG CACATTTTTT CCACATGGTG 900 

GAGTCATTGC CACAGGCACA AGAGATGGCC ACGTCCAGTT CTGGACAGCT CCTAGGGTCC 960 

TGTCCTCACT GAAGCACTTA TGCCQGAAAG CCCTTCGAAG TTTCCTAACA ACTTACCAAG 1020 

TCCTAGCACT GCCAATCCCC AAGAAAATGA AAGAGTTCCT CACA^TACAGG ACTTTTTAAG 1080 

CAACACCACA TCTTGTGCTT CTTTGTAGCA GGGTAAATCG TCCTGTCAAA GGGAGTTGCT 1140 

GGAATAATGG GCCAAACATC TGGTCTTGCA TTGAAATAGC ATTTCTTTGG GATTGTGAAT 1200 
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AGAATGTAGC AAAACCAGAT TCCAGTGTAC TAGTCATGGA TTTTTC 1246 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 422 base pairs 

(B) TYPE: nucleic acid 
(C> STRAKDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

ACCATGGTTC CAAGTCCTCT CCCCTGTGGT CAAGTTGCCC GAATGTTGGG CCCAAGTGCC SO 

TTTTCCTCCT TGGGCCTCCC CTTCTGACCT GCAGGACAGT TTTCCGGAGC CCATTTGGTA 12 0 

TGAGGTATTA ATTAGCCTTA ACTAAATTAC AGGGGACTCA GAGGCCGTGC TCCTGACCGA 180 

TCCAGACACT ATTTTTTTTT TTTTTTTTTA ACAATGGTGT GCATGTGCAG GAAATGACAA 240 

ATTTGTATGT CAGATTATAC AAGGATGTAT TCTTAAACCG CATGACTATT CAGATGGCTA 300 

CTGAGTTATC AGTGGCCATT TATTAGCATC ATATTTATTT GTATTTTCTC AACAGATGTT 360 

AAGGTACAAC TGTGTTTTTC TCGATTATCT AAAAACCATA GTACTTAAAT TGAAAAAAAA 420 

AA 422 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

{A> LENGTH: 2019 base pairs 
{B> TYPE: nucleic acid 
(C> STRANDEDNSSS : single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: 



DNA 
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(xi) SEQUENCE DSSCRIPTIOKf: SEQ ID NO: 24: 

GGCACGAGGC GGGGTCAGGG CGGAGGCTGA GGACCAAGTA GGCATGGCGG AGGGCGGGAC 60 

CGQCCCCGM} GGACGGGCCG GCCCGGGACC CGCAGGTCCT AATCTGAAGG AGTGGCTGAG 120 

GGAGCAGTTC TGTGACCATC CACTGGAGCA CTGTGACGAT ACAAGACTCC ATGATGCAGC 180 

CTATGTAGGG GACCTCCAGA CCCTCAGGAA CCTACTGCAA GAGGAGAGCT ACCGGAGCCG 240 

CATCAATGAG AAGTCTGTCT GGTGCTGCGG CTGGCTTCCC TGCACACCAC TGAGGATCGC 3 00 

AGCCACTGCA GGCCATGGGA ACTGTGTGGA CTTCCTCATA CGCAAAGGGG CCGAGGTGGA 3 60 

CCTGGTGGAT GTCAAGGGGC AGACTGCCCT GTATGTGGCT GTAGTGAACG GGCACTTGGA 420 

GAGCACTGAG ATCCTTTTGG AAGCTGGTGC TGATCCCAAC GGCAGCCGGC ACCACCGCAG 480 

CACTCCTGTG TACCATGCCT YTCGTGTGGG TAGGGACGAC ATCCTGAAGG CTCTTATCAG S40 

GTATGGGGCA GATGTTGATG TCAACCATCA TCTGAATTCT GACACCCGGC CCCCTTTTTC 600 

ACGGCGGCTA ACCTCCTTGG TGGTCTGTCC TCTATACATC AGTGCTGCCT ACCATAACCT 660 

TCAGTGCTTC AGGCTGCTCT TGCAGGCTGG GGCAAATCCT GACTTCAATT GCAATGGCCC 720 

TGTCASCACC CAGGAGTTCT ACAGGGGATC CCCTGGGTGT GTCATGGATG CTGTCCTGCG 780 

CCATGGCTGT GAAGCAGCCT TCGTGAGTCT GTTGGTAGAG TTTGGAGCCA ACCTGAACCT 840 

GGTGAAGTGG GAATCCCTGG GCCCAGAGGC AAGAGGCAGA AGAAAGATGG ATCCTGAGGC 900 

CTTGCAGGTC TTTAAAGAGG CCAGAAGTAT TCCCAGGACC TTGCTGAGTT TGTGCCGGGT 960 

GGCTGTGAGA AGAGCTCTTG GCAAATACCG ACTGCATCTG GTTCCCTCGC TGCCGCTGCC 1020 

AGACCCCATA AAGAAGTTTT TGCTTTATGA GTAGCATTCA CATGCAGTGC TGACTGCAAT 1080 

GTGGAAGCCG ATCACCTGCA GTGAAAACTG ACACAGACTC TGGCATCCTG GGAACCATGG 1140 

CCTGTGCTGC CAGCTTGATC CTTGGCTGTC AGTGAAGAAA AAACGGCTGT GTTCTCTTGG 1200 

ACTGTGATTC TATCTCAGGT GCTTGGGCCA TCGAACGCTC CTTGAGTCAT TGTCAACTGA 12 60 
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GAGGCACATA CAAACTTAAT TTTGTTCCTC TTCAGTCTCT CTGTTTTGGA TTCTTCCTGG 
CAATGTGTGC AGCATGGGCT GAGCCTGGTG ATTGCCCTAG TGGGGAAGGC TTTTTTCTCC 
AGGCTATGCA TCTATTTATG TTCCXACTTT GCAATTTATT GTTCTTTTAA GGCTTGATAT 
CAAAACAGAA AGAGGTTTGT TAAGAAAAGA TATAGGGAGA AAGGAATTCC GGTTCCGTGC 
ACTTGCTAGC CTGCTTTCCT TGCCTGGGTT TGTCTGTCTA TGCTGCCTGG TGCACATCCC 
TTCTCTTTGC TGCCACTGTT CTATTTTGGG AGTTGTCTTC CGTCTAAGAT GGCTTCTGGG 
GTTCTATCTT ATTGCACAGA GGTCCCAGAA CAGTGTTCAT AGGGCACCAT CTGCTCTGCC 
AAGGGTTTTC TGATGTCTTA CCCTGGGGAT CTTCAGACAG TGGTTACCTX TAGGAGACCC 
ACCTGGAACT AACCATTAAG TGACTGCCCA CATTCAGATC AGGGACCATC TTAATAGTAC 
TCACTGCCAG TCCTCACAAG AGAAGATGAC ACGGGTGCTC TCTTCAGACA CTCCCATACA 
GGAAGTTGGA AAATGTCTTG GTCACCTGGG TTGTTCCCAG GCTACAACTT CTTGGTGTTC 
CACTAARACC AGRATATCCT AGTTTTTTGG GTTGACTGTT CCCTCCCCAC TTTCCTTGAA 
NCCCAATGCC CNTTTGTKTN GGTTGCTTCC CTAAAAKTT 

(2) INTORMATION FOR SEQ ID NO : 2 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 350 amino acids 

(B) TYPE: amino acid 
{ C ) STRANDEDNESS : s ingle 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: BNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Ala Arg Gly Gly Val Arg Ala Glu Ala Glu Asp Gin Val Gly Met Ala 
15 10 15 



1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2019 
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Gl\i Gly Gly Thr Gly Pro Asp Gly Arg Ala Gly Pro Gly Pro Ala Gly 
20 25 30 

Pro Asn Leu Lys Glu Trp Leu Arg Glu Gin Phe Cys Asp His Pro Leu 
35 40 45 

Glu His Cys Asp Asp Thr Arg Leu His Asp Ala Ala Tyr Val Gly Asp 
50 55 60 

Leu Gin Thr Leu Arg Asn Leu Leu Gin Glu Glu Ser Tyr Arg Sex Arg 
65 70 75 80 

lie Asn Glu Lys Ser Val Trp Cys Cys Gly Trp Leu Pro Cys Thr Pro 
85 90 95 

Leu Arg lie Ala Ala Thr Ala Gly His Gly Asn Cys Val Asp Phe Leu 
100 105 110 

lie Arg Lys Gly Ala Glu Val Asp Leu Val Asp Val Lys Gly Gin Thr 
115 120 125 

Ala Leu Tyr Val Ala Val Val Asn Gly His Leu Glu Ser Thr Glu He 
130 135 140 

Leu Leu Glu Ala Gly Ala Asp Pro Asn Gly Ser Arg His His Arg Ser 
145 150 155 160 

Thr Pro Val Tyr His Ala Xaa Arg Val Gly Arg Asp Asp He Leu Lys 
165 170 175 

Ala Leu He Arg Tyr Gly Ala Asp Val Asp Val Asn His His Leu Asn 
180 185 190 

Ser Asp Thr Arg Pro Pro Phe Ser Arg Arg Leu Thr Ser Leu Val Val 
195 200 205 

Cys Pro Leu Tyr He Ser Ala Ala Tyr His Asn Leu Gin Cys Phe Arg 
210 215 220 

Leu Leu Leu Gin Ala Gly Ala Asn Pro Asp Phe Asn Cys Asn Gly Pro 
225 230 235 240 



Val Asn Thr Gin Glu Phe Tyr Arg Gly Ser Pro Gly Cys Val Met Asp 
245 250 255 
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Ala Val Leu Arg His Gly Cys Glu Ala Ala Phe Val Ser Leu Leu Val 
260 265 270 

Glu Phe Gly Ala Asn Leu Asn Leu Val Lys Trp Glu Ser Leu Gly Pro 
275 280 285 

Glu Ala Arg Gly Arg Arg Lys Met Asp Pro Glu Ala Leu Gin Val Phe 
290 295 300 

Lys Glu Ala Arg Ser lie Pro Arg Thr Leu Leu Ser Leu Cys Arg Val 
305 310 315 320 

Ala Val Arg Arg Ala Leu Gly Lys Tyr Arg Leu His Leu Val Pro Ser 
325 330 335 

Leu Pro Leu Pro Asp Pro lie Lys Lys Phe Leu Leu Tyr Glu 
340 345 350 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 419 base pairs 

(B) TYPE: nucleic acid 
iO STRANDEDNESS: single 
(D) TOPOLOGY: linear 

Cii} MOLSCtTLE TYPE ; DNA 



Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

GCATCCATGG CGGAGGGCGG CAGCACGACG GGCGGGCAGG GCCGGGCTCC GCAGGTCGTA 60 

ATCTGAAGGA GTGGCTGAGG GAQCAATTTT GTGATCATCC GCTGGAGCAC TGTGAGGACA 120 

CGAGGCTCCA TGATGCAGCT TACGTCGGGG ACCTCCAGAC CCTCAGGAGG CTATTGCAAG 160 

AGGAGAGCTA CCGGAGCCGC ATCAACGAGA AGTCTGTCTG GTGCTGTGGC TGGCTCCCCT 240 

GCACACCGTT GCGAATCGCG GCCACTGCAG GCCATGGGAG CTGTGTGGAC TTCCTCATCC 300 

GGAAGGGGGC CGAGGTGGAT CTGGTGGACG TAAAAGCACA GACGGCCCTG TATGTGGCTG 360 
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TGGTGAACGG GCACCTAGAG AGTACCCAGA TCCTTCTCGA AGCTGGCGCG GACCCCAAC 419 
(2) INFORMATION FOR 2EQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 595 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

<xi) SEQUENCE DESCRIPTION; SEQ ID NO:27: 
GAGGAAGAAG AAAAGTGGAC CCTGAGGCCT TGCAGGTCTT TAAAGAGGCC AGAAGTGTTC 
CCAGAACCTT GCTGTGTCTG TGCCGTGTGG CTGTGAGAAG AGCTCTTGGC AAAACCGGCT 
TCATCTGATT CCTTCGCTGC CTCTGCCAGA CCCCATAAAG AACTTTCTAC TCCATGAGTA 
GACTCCAAGT GCTGCGGTTG ATTCCAGTGA GGGAGAAAGT GATCTGCAGG GAGGTGGACA 
CCGAGCCCTG AGTGCTGTGC TGCTGCTGGT CTCCTGATGG CTGTTGCTGC AGAAGATGTC 
CTCGTAGACT GTCATTGCTC CTCAGGTGCC TGGGCCGCTG AACAGTCCTT GGGTCATTGT 
CAGCFGAGAG GCTTATACTA AAGTTATTAT TGTTTTTCCC AAGTTCTCTG TTCTGGATTT 
TCAGTTGCAT ATTAATGTAA CGGGCCATGG GGTATGTACA TGTAGGGGCT GAGGTTGGAG 
GCCTACTAAT TTCCTGTAGG GAAGACTCCC AGCACTTCTG GAACTGTGCT TCTCTTTATT 
TTTCTACTTC TCAATTTGAT GGTTCGATTA AAGCCTTCTA GTATCTCAAT GAAAA 

(2) INFORMATION FOR SEQ ID NO: 28: 
(i) SEQUENCE CHAilACTERISTICS : 

(A) LENGTH: 896 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



60 
120 
180 
240 
300 
360 
420 
480 
540 
595 
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(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 4. .39^ 



(Xi) SEQUENCE DESCHIPTION": SEQ ID WO: 28: 

CTG ATG TCC GCA ATT CTG AAG GTT GQA CAC CAC TGC TGG CTG CCT GTG . 48 
Met Ser Ala lie Leu Lys Val Gly His His Cys Trp Leu Pro Val 
1 5 10 IS 

ACA TCC GOT GTC AAT CCC CAA AGG ATG CTG AGG CCA CCA CCA ACC GCT 96 
Thr Ser Ala Val Asn Pro Gin Arg Met Leu Arg Pro Pro Pro Thr Ala 
20 25 30 

GTT TTC AAC TGT GCC GCT TGC TGC TGT CTG TGG GGG CAG ATG CTG ATG 144 
Val Phe Asn Cys Ala Ala Cys Cys Cys Leu Trp Gly Gin Met Leu Met 
35 40 45 

AAT ACA TAG CGT GTA GTT CAG CTT CCT GAG GAG GCC AAG GGC TTG GTG 192 
ASH Thr Tyr Arg Val Val Gin Leu Pro Glu Glu Ala Lys Gly Leu Val. 
50 55 60 

CCA CCA GAG ATT CTA CAG AAG TAC CAT GGA TTC TAC TCT TCC CTC TTT 240 
Pro Pro Glu lie Leu Gin Lys Tyr His Gly Phe Tyr Ser Ser Leu Phe 

70 75 

GCC TTG GTG AGG CAG CCC AGG TCG CTG CAG CAT CTC TGC CGT TGT GCG 2 8B 

Ala Leu Val Arg Gin Pro Arg Ser Leu Gin His Leu Cys Arg Cys Ala 
80 85 90 95 

CTC CGC AGT CAC CTG GAG GGC TGT CTG CCC CAT GCA CTA CCG CGC CTT 3 36 

Leu Arg Ser His Leu Glu Gly Cys Leu Pro His Ala Leu Pro Arg Leu 
100 105 110 

CCC CTG CCA CCG CGC ATG CTC CGC TTT CTG CAG CTG GAC TTT GAG GAT 384 
Pro Leu Pro Pro Arg Met Leu Arg Phe Leu Gin Leu Asp Phe Glu Asp 
115 120 125 



CTG CTC TAC TAGGCTTGCT GCCCTGTGAA CAAAGCAGAC CCCACCCCCA 
Leu Leu Tyr 
130 



433 
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CCCCAAGGGC ATCTCTCAGC AATGAATGAT GCAAGGCGGT CTGTCTTCAA GTCAGGAGTG 493 

GACGCCTTGA TCCACACTTG AGAGAAGAGG CCAGATCAGC ACCYGGCTGG TAGTGATNGC 553 

AGAGGGCACC TGTGCAGATC TGTGTGCGCA CTGGAAATCT CTAGGCTGAA GGCYAGAGCA 613 

AATGGTGCAR GTGTTAGTCC TTGGGANGAG AGACAGAWGG TGAGAAAGCA AGACAGAGGX 673 

GAGAGTGCAC ATGTCAAGTG GTAGATTGCC TTAAAAGAAA GCTAAAAAAA GAAAAAGArT 733 

CGGGCGAACT TCTTTAGGGG TAATGCTGCA GCGTGTTAAA CTGACTGACC AGCGTCCATA 793 

TCTTTGGACC CTTCCCGGGT GAAAAAGCCC CTTCATCCTC CAGCGCTCCC CAAGGGTGCT 8 S3 

TAGCAATACC GGGrGCTTTT CTGCCGCAAA GTGAGTTACC AAA 896 

(2) INFOKJIATION FOR SEQ ZD NO; 29: 

(i.) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 130 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: procein 

(xi) SEQUENCE DESCRIPTION: SEQ ID KO:29; 

Met S«- Ala lie Leu Lys Val Gly His His Cys Trp Leu Pro Val Thr 
15 10 15 

Ser Ala Val Asn Pro Gin Arg Met Leu Arg Pro Pro Pro Thr Ala Val 
20 25 30 

Phe Asn Cys Ala Ala Cys Cys Cys Leu Trp Gly Gin Met Leu Met Asn 
35 40 45 

Thr Tyr Arg Val Val Gin Leu Pro Glu Glu Ala Lys Gly Leu Val Pro 
50 55 60 

Pro Glu lie I,eu Gin Lys Tyr His Gly Phe Tyr Ser Ser Leu Phe Ala 
65 70 75 80 

Leu val Arg Gin Pro Arg Ser Leu Gin His Leu Cys Arg Cys Ala Leu 

85 90 95 
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Arg Ser His Leu Glu Gly Cys Leu Pro His Ala Leu Pro Arg Leu Pro 
100 105 110 

Lgu Pro Pro Arg Met Leu Arg Phe Leu Gin Leu Asp Phe Glu Asp Leu 
115 120 125 

Leu Tyr 
130 

C2) INFORMATION FOR SSQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
GTGGGGGCGT CATCATGACC TCCTCTAGGG CTCTGCAACA TGACTCCTGT GGTGCAAATC 60 

AACAA#tTTGT TCACTGATGA ATCCACAAGG ATCTCTGGGC CTACAACCAG GTCCTGGTCC 120 

ACATGACTGT CGTCTTCGGA GAAGGCACCA CTCGCCCCCG GCAGGTACGG CTGACACCTC 180 

CATGGGAGAA GACGTATCCA GGCAGCAGCT GCGCGGCCCT TCAAGAGGGC ACATCCCGTC 240 

ATCTAAAGGC ACGGTGTACT GAAGGTAGTC CTGAGACATG AGTCCGATTA CTACAGGCAC 300 

GTGTTCCTCC AGGTGGAGGC TCAGGTCCCC GGGTGAGCTG GGGCTGCAGC GGGACTCAGG 3 SO 

GCGCGGCTCT GGCTGCAGGT CTCGCAGCTC CCTGGGCTGT AGCTCCCGCA GATCCTTGCG 420 

CACACCGTTG ACTGGT 43 g 
(2) INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH; 2180 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : ©ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

TTAATAGTAC CTACATAOTA OAAAATTATA ACTCCACTTT AAAACAATGT TTTCTTTCTA 60 

TTCAAATCAA TTTAAAACTT TTTATAAACA TTAATGTTGC AAGAGAATCC AGTCCATTTA 120 

TGAAAATTAG TTGACAATCA AGTTCACCCA AGAAAATGTT GACTAAGCTA AAGAAATCAC 180 

AGATAAAACA TTTTACCAAA AGGATAGGTA ACACACAAAA AAATGCTATC ACAGGAAGCT 240 

ATGATCATCT AATATTTCTT TAATAATAAT TCTAGTTCCA TAGGTTTTCA TGTTATGCCA 300 

ATTTGTACCC GAGTTTAATT ACAGAAAAQG CAACAATTTC TAAATTGGTG GTATACATTT 360 

CTTTACAATT TTTTAATGTA AGGCCATTTA TTAAAATAGA CAAACTAGAA GATGAAAACG 420 

AAGGCAACAG AAAAATTCAA CTTTTCACAA CCAAAAGAAT TAGCACAACC TTAGAAATAA 460 

TTTAGAAAAA AGTGTTGTTA AAAGATATGT TGCAGATCTC CGTTCCATTA CCCAAGATTA 540 

TGTCAATTCA CGATTCTAAA TAAATCTTTT TAAAGTAAGA GATTAAAAAC TCATCTTCAG 600 

TGTATATGTA AATTCCGTGG TTTTATCACA CAGGTATGTT TATTCAACAC TGCTTTGGAA 660 

ATGGACCATT TAAAAGGACA TOGCAATTTC CATTCTGTTA AGXTTCATTC AACCTTTACT 720 

TAGGGGTTGA TTACCACATG AAATGTGCTT TTAATGCATA AAAATCACAG TGGATTAGCC 780 

AGCAAAAGGO ACTGGGCGGG GGGGGCATTG AOGAGAATTT GATAATTCAC ATTGTGATTA 840 

TTCTGCACAT TGATGAAACA TAATTCACAC CTCTAAAACC TCAAGACTTC CCTTTTTTAA 900 

AGAACCAAAA TAAACCCAAG ACACCTTGCT GACACTTCCC CACCCCTAAA CAAACTGATG 960 

ACTCTTTTAC ACATAAAACT GAAATAGTTA TGGCAGCAAA AGATTTTGAT GGCAATGAAA 102 0 
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GTTTGTAAAC TGTATTTCAA TCTCTTGTTC TTATTCCCAA AGTGCAAGAT GCAGGGTTCT 10 BO 

CAATCTTTCA GTAGTGCTTC TCCTGTAAAT AATCCTTCAT TTTGTTTGGC AAAGGCAGTT 1140 

TCTGAATTAA GTCTATTCTG GTATACTQAC GTATAACAAA ACGACACAGG TACTGCAACG 1200 

AGCGCACCTA TGAACCCCGG AACACTGGTT GGCAAGTTCT GACGGAAGTG CAGATTCCAG 12 60 

GCAGCGAGAC CTTGAATAAC AAAAAGCTCC CATTTTCAGA GTCCCTGATT GAATGCTCCA 1320 

ATTAGATCAA CTATGGACGT ATGTCCTTCC ACATCGGCTG TTCATAAAAG CTAAACCTAC 1380 

CATTTGAGTG CTCAATTCTA GTGTGAAGTG TTTTACCATC GGAGCGAAAG TCACAGCTTA 1440 

AAAGGTAACG GTCGTCAGAA CTGTCCCGAA CAAGAAAAGA ACCATCTGGC ACGTTTGCTA 150 0 

GCTTCCCTTC TGCCTCCCAA CGTGTGATTG GTCCCCAGTA CCATCCTTGC TTTGCAAGTT 1S60 

TTTTCAGCTC CTCTGTAAGG CTTGTCACAA CCATGGGACC ACTACTTTGC ACTGAGTCAT 1620 

aaactcttgc aaccccagga gcagagttcg GATCAAAATT CAAAXGACAG CGCATAACTT 1680 

TCAGCCACGT GGGGCTTTCT GTCCAGTGAG TCCACTGAAA GTTCCCCTTT GGGATTTGGA 1740 

TTATTCCTGC ATTGGAGTAA CCAATGGTGA AGATTGGAGG GACATCCATC GTGAACCCGC 1800 

TCTCCGGGGT TCTGCAACAT GACTCCCGTG GTGCCAATCA ACAAGCCATT CACCGGACTG XB60 

ATCCAGOAAG ATCTCTGGGG CGACAACTAG GTCCTGGTCT ACCTGACTCT CATCCTCGGG 1920 

GAAAGCGCGC CCTCCCACTT GAGGAGGAAC CGCAGAGACT TCCATGGGAG AAGAGCTGTC 1980 

CAGACAATAG CTCCGTGATC CTTCCAAAGG ATACATCCCC TCATCTAAAG GCACAGTATA 2 040 

CTGAATGTAG TCCTGAGGCA TAAGTCCAAT AACGACAGGC ACATGTTCAT CCAGGTGAAG 2100 

ATGCAGGTCT CCATTATGAG AAGCCGAGCT CTTCAGTGAA TTGGCTTGCT CCTGGCACGT 2160 

GGTCTCAGAC TGGAGGTCGT 2190 



(2) INFOBMATION FOR SEQ ID NO: 32: 
ix) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2 649 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

GGCACGAGGC TGTGTCCAGC ACACAGAGAG GGCCCGGCCA TCTGCTTTGG TTCAGAGCCC 60 

TGTGTCTGTC TGTCACTTAG ACTCTTCCTC CCGGCTCGCA GCTCACCCTC CATCCTCCTT 120 

ACTGGCTCCA GCATGACTCG CTTCTCTTAT GCAGAGTACT TTGCTCTGTT TCACTCTGGC 180 

TCTGCACCTT CCAGGTCCCC TTCGTCTCCC GAGAACCCAC CGGCCCGCGC ACCCCTGGGT 240 

CTGTTCCAAG GGGTCATGCA GAAGTATAGC AGCAACCTGT TCAAGACCTC CCAGATGGCG 300 

GCTATGGACC CCGTGCTGAA GGCCATCAAG GAAGGGGATG AAGAGGCCTT GAAGATCATG 360 

ATCCAGGATG GGAAGAATCT TGCAGAGCCC AACAAGGAGG GCTGGCTGCC GCTCCACGA<^ 420 

GCTGCCTACT ATGGCCAGCT GGGCTGCCTG AAAGTCCTGC AGCAAGCCTA CCCAGGGACC 480 

ATTGACCAAC GCACACTGCA GGAAGAGACA GCATTATACC TGGCCACATG CAGAGAACAC 54 0 

CTGGATTGCC TCCTGTCGCT GCTCCAGGCG GGGGCAGAGC CTGACATCTC TAACAAATCC 600 

AGGGAGACTC CACTTTACAA AGCCTGTGAG CGCAAGAACG CGGAGGCGGT GAGGATATTG 660 

GTGCGATACA ACGCAGACGC CAACCACCGC TGTAACAGGG GCTGGACCGC ACTGCACGAG 720 

TCTGTCTCCC GCAATGACCT GGAGGTCATG GAGATCCTAG TGAGTGGCGG GGCCAAGGTG 780 

GAGGCCAAGA ATGTCTACAG CATCACCCCT TTGTTTGTGG CTGCCCAGAG TGGGCAGCTG 840 

GAGGCCCTGA GGTTCCTGGC CAAGCATGGT GCAGACATCA ACACGCAGGC CAGTGACAGT 900 

GCATCAGCCC TCTACGAGGC CAGCAAGAAT GAGCATGAAG ACGTGGTAGA GTTTCTTCTC 960 

TCTCAGGGCG CCGATGCTAA CAAAGCCAAC AAGGACGGCC TGCTCCCCCT GCATGTTGCC 1020 
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TCCAAGAAGG GCAACTATAG AATAGTGCAG ATGCTGCTGC CTGTGACCAG CCGCACGCGC 1080 

GTGCGCCGTA GCGGCATCAG CCCGCTGCAC CTAGCGGCCG AGCGCAACCA CGACGCGGTG 1140 

CTGGAGGCGC TGCTGGCCGC GCGCTTCGAC GTGAACGCAC CTCTGGCTCC CGAGCGCGCC 1200 

CGCCTCTACG AGGACCGCCG CAGTTCTGCG CTCTACTTCG CTGTGGTCAA CAACAATGTG 1260 

TACGCCACCG AGCTGTTGCT GCTGGCGGGC GCGGACCCCA ACCGCGATGT CATCAGCCCT 13 2 0 

CTGCTCGTGG CCATCCGCCA CGGCTGCCTG CGCACCATGC AGCTGCTGTT GGACCATGGC 1360 

GCCAACATCG ACOCCTACAT CGCCACTCAC CCCACCOCCT TTCCAGCCAC CATCATGTTT 1440 

GCCATGAAGT GCCTGTCGTT ACTCAAGTTC CTTATGGACC TCGGCTGCGA TGGCGAGCCC 1500 

TGCTTCTCCT GCCTGTAC GG CAACGGGCCG CACCACCCGC CCCGCGACCT GGCCGCTTCC 1560 

ACGACGCACC CGTGGACGAC AAGGCACCTA GCGTGGTGCA GTTCTGTGAG TTCCTGTCGG 1620 

CCCCGGAAGT GAGCCGCTGG GCGGGACCCA TCATCGATGT CCTCCTGGAC TATGTGGGCA 1680 

ACGTGCAGCT GTGCTCCCGG CTGAAGGAGC ACATCGACAG CTTTGAGGAC TGGGCTGTCA 1740 

TCAAGGAGAA GGCAGAACCT CCGAGACCTC TGGCTCACCT CTGCCGGCTG CGGGTTCGGA 1800 

AGGCCATAGG AAAATACCGG ATAAAACTCC TGGACACACT GCCGCTTCCC GGCAGGCTAA 1860 

TCAGATftCTT GAAATATGAG AATACACAGT AACCAGCCTG GAGAGGAGAT GTGGCCTTCA 1920 

GACTGTTTCC GGGACGCCCC AGGTGGCCTG CATCCAGGAC CCCCTGGGGT CAGAACAGGT 1980 

GTGACCTTGC TGGTTCTTTG CTGGAGCTTC ACCCAAAGTG AGAACCTGAT GTGGGGAGTG 2040 

GACGTGGAAC CTCTGCTTTC ACACTGTCAG CGGATCGCAG ACCCGCTCTG CTTCTGGCCA 2100 

TAGCCAGAGA CCTTCAACCT GGGGCCAGGG GAGAGCTGGT CTGGGCAAGG TGGCCCAGGC 2160 

AGGAATCCTG GCCTTAAGCT GGAGAACTTG TAGGAATCCC TCACTGGACC CTCAGCTTTC 2220 

AGGCTGCGAG GGAGACGCCC AGCCCAAGTA TTTTATTTCC GTGACACAAT AACGTTGTAT 2280 

CAGAAAAAAA AAAAAACATG GGCGCAGCTT ATTCCTTAGT AGGGTATTTA CTTGCATGCG 2340 

CGCTTAAAGC TACTGGAAAC ATGCGTTCCA CTATGCTTGA GAATCCCC TT GCACTGGTAA 2400 
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ACGAGAGCCG ACGTGCTTCA AGGTTGGATT TTTGGTTGCC CCTTTGGCGT TCCGCGGGTT 2460 

TOTCCGACGT AATTGACCCC GTGTTTTGTC ACTTTCGAGT GTTCCGACTA TTGGGGGGCT 2520 

TTTGGTTGTC CCCAAAATTG TGGGTGGTGT GCGGACGCCA CGAGAAGTGG TTCATGGGCG 2580 

ATAATCATTA CTGGAGAATG TAGAGCGGCG GTTTTACGAA TAAATATTTT TTAAGCCGCC 2640 

TTCCCAAAA 2649 
{2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCKIPTION: SEQ ID NO:33: 

CCTCCTGAGA GTTCGCCGGC CCGGGCCCAA TGGGTTGTTC CAAGGGGTCA TGCAGAAATA 60 

CAGCAGCAGC TTGTTCAAGA CCTCCCAGCT GGCGCCTGCG GACCCCTTGA TAAAGGCCAT 120 

CAAGGKTGCG ATGAAGAGGC CTTGAAGACC ATGATCAAGG AAGGGAAGAA TCTCGCAGAG 180 

CCCAACAAGG AGGGCTGGCT GCCGCTGCAC GAGGCCGCAT ACTATGGCCA GGTGGGCTGC 240 

CTGAAAGTCC TGCAGCGAGC GTACCCAGGG ACCATCGACC AGCGCACCCT GCAGGAGGAA 300 

ACAGCCGTTT ACTTGGCAAC GTGGAGGGGC CACCTGGACT GTCTCCTGTC ACTGCTCCAA 360 

GCAGGGCCAG AGCGGGACAT CTCCAACAAA TCCCOAGAGA ACCGCTCTAC AAAGCCTGTG 420 

AGCGCAAGAA CGCGGAAGCC GTGAAGATTC TTGGTGCAGC ACAACGCAGA CACCAACAAC 480 

GCTGCAACCG GGCTG 495 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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(A> LENGTH: 709 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCKIPTION: SEQ IP NO; 34: 

GTGCAGCTCT GCTCGCGGCT GAAGGAACAC ATCGACAGCT TTGAGGACTG GGCCGTCATC 60 

AAGGAGAAGG CAGAACCTCC AAGACCTCTG GCTCACCTTT GCCGACTGCG GGTTCGAAAG 120 

GCCATTGGGA AATACCGTAT AAAACTCCTA GACACCTTGC CGCTCCCAGG CAGGCTGATT ISO 

AGATACCXGA AATACGAGAA CACCCAGTAA CTGGGGCCAC GGGGAGAGAG GAGTAGCCCC 240 

TCAGACTCTT CTTACTAAGT CTCAGGACGT CGGTGTTCCC AACTCCAAGG GGACCTGGTG 300 

ACAGACGAGG CTGCAGGCTG CCTCCCTCTC AGCCTGGACA GCTACCAGGA TCTCACTGGG 360 

TCTCAGGGCC CAGAGCTTTG GCCAGAGCAG AGAACAGAAT GTGTCAAGGA GAAGAATCAT 420 

TTGTTTACAA ACTGATGAGC AGATCCCAGA CCTTCTCTAC CTTCAGGAAT GGCAGAAACC 480 

TCTATTCCTG GGGCCAGGGC AGAGCTTGAG GTGTTCTGGG GAAGGTGGTG CTCAGAGCCT 540 

TCCCTGTGCC CCTCCACTTG TTCTGGAAAA CTCACCACTT GACTTCAGAG CTTTCTCTCC 600 

AAAGACTAAG ATGAAGACGT GGCCCAAGGT AGGGGGTAGG GGGAGCCTGG GTCTTGGAGG 660 

GCTTTGTTAA GTATTAATAT AATAAATGTT ACACATGTGA AAAAAAAAA 709 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(ix) FEATURE; 

(A) NAME/ KEY: CDS 

(B) LOCATION: 1. .624 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

TTG GAG AAG TGT GOT TGG TAT TGG GGG CCA ATG AAT TGG GAA GAT GCA 48 
Leu Glu Lys Cys Gly Trp Tyr Trp Gly Pro Met Asn Trp Glu Asp Ala 
15 10 15 

GAG ATG AAG CTG AAA GGG AAA CCA GAT GGT TCT TTC CTG GTA CGA GAC 96 
Glu Met Lys Leu Lys Gly Lys Pro Asp Gly Ser Phe Leu val Arg Asp 
20 25 30 

AGT TCT GAT CCT CGT TAC ATC CTG AGC CTC AGT TTC CGA TCA CAG GGT 144 
Ser Ser Asp Pro Arg Tyr He Leu Ser Leu Ser Phe Arg Ser Gin Gly 
35 40 45 

ATC ACC CAC CAC ACT AGA ATG GAG CAC TAC AGA GGA ACC TTC AGC CTG 192 
He Thr His His Thr Arg Met Glu His Tyr Arg Gly Thr Phe Ser Leu 
50 55 60 

TGG TGT CAT CCC AAG TTT GAG GAC CGC TGT CAA TCT GTT GTA GAG TTT 240 
Trp Cys His Pro Lys Phe Glu Asp Arg Cys Gin Ser Val Val Glu Phe 
65 70 75 80 

ATT AAG AGA GCC ATT ATG CAC TCC AAG AAT GGA AAG TTT CTC TAT TTC 288 
He Lys Arg Ala He Met Hie Ser Lys Asn Gly Lys Phe Leu Tyr Phe 
85 90 95 

TTA AGA TCC AGG GTT CCA GGA CTG CCA CCA ACT CCT GTC CAG CTG CTC 336 
Leu Arg ser Arg Val Pro Gly Leu Pro Pro Thr Pro Val Gin Leu Leu 
100 105 110 

TAT CCA GTG TCC CGA TTC AGC AAT GTC AAA TCC CTC CAG CAC CTT TGC 384 
Tyr Pro Val Ser Arg Phe Ser Asn Val Lys Ser Leu Gin His Leu Cys 
115 120 125 

AGA TTC CGG ATA CGA CAG CTC GTC AGG ATA GAT CAC ATC CCA GAT CTC 432 
Arg Phe Arg He Arg Gin Leu Val Arg He Asp His He Pro Asp Leu 
130 135 140 
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CCA CTG CCT AAA CCT CTG ATC TCT TAT ATC CGA AAG TTC TAC TAC TAT 480 
Pro Leu Pro Lys Pro Leu Xle Ser Tyr He Arg Lys Phe Tyr Tyr Tyr 
145 150 155 160 

GAT CCT CAG GAA GAG GTA TAC CTG TCT CTA AAG GAA GCG CAG CGT CAG 528 
Asp Pro Gin Glu Glu Val Tyr Leu Ser Leu Lys Glu Ala Gin Arg Gin 
165 170 175 

TTT CCA AAC AGA AGC AAG AGG TGG AAC CCT CCA CGT AGC GAG GGG CTC S76 
Phe Pro Asn Arg Ser Lys Arg Trp Asn Pro Pro Arg Ser Glu Gly Leu 
180 185 190 

CCT GCT GGT CAC CAC CAA GGG CAT TTG GTT GCC AAG CTC CAG CTT TGAAGAACCA 
631 

Pro Ala Gly His His Gin Gly His Leu Val Ala Lys Leu Gin Leu 
195 200 205 

AATTAAGCTA CCATGAAAAG AAGAGGAAAA GTOAGGGAAC AGGAAGGTTG GGATTCTCTG 691 

TGCAGAGACT TTGGTTCCCC ACGCAAGCCC TGGGGCTTGG AAGAAGCACA TGACCGTACT 751 

CTGCGTGGGO CTCCACCTCA CACCCACCCC TGGGCATCTT AGGACTGGAG GGGCTCCTTG 811 

GAAAACTGGA AGAAGTCTCA ACACTGTTTC TTTTTCA 843 



(2 J XNFORKATION FOR SEQ ID NO: 36; 

(ii SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 207 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

{Xil SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Leu Glu Lys cys Gly Trp Tyr Trp Gly Pro Met Asn Trp Glu Asp Ala 
15 10 15 

Glu Met Lys Leu Lys Gly Lys Pro Asp Gly Ser Phe Leu Val Arg Asp 
20 25 30 



Ser Ser Asp Pro Arg Tyr lie Leu Ser Leu Ser Ptie Arg Ser Gin Gly 
35 40 45 
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He Thr His His Thr Arg Met Glu His Tyr Arg Gly Thr Phe Ser Leu 
50 55 60 

Trp Cys His Pro Lys Phe Olu Asp Arg Cys Gin Ser Val Val Glu Phe 
65 70 -75 80 

He Lys Arg Ala He Met His Ser Lys Asn Gly Lys Phe Leu Tyr Phe 
85 90 95 

Leu Arg Ser Arg Val Pro Gly Leu Pro Pro Thr Pro val Gin Leu Leu 
100 105 110 

Tyr Pro Val Ser Arg Phe Ser Asn Val Lys Ser Leu Gin His Leu Cys 
115 120 125 

Arg Phe Arg He Arg Gin Leu Val Arg He Asp His He Pro Asp Leu 
130 135 140 

Pro Leu Pro Lys Pro Leu He Ser Tyr He Arg Lys Phe Tyr Tyr Tyr 
145 150 155 160 

Asp Pro Gin Glu Glu Val Tyr Leu Ser Leu Lys Glu Ala Gin Arg Gin 
165 170 175 

Phe Pro Asn Arg Ser Lys Arg Trp Asn Pro Pro Arg Ser Glu Gly Leu 
180 185 190 

Pro Ala Gly His His Gin Gly His Leu Val Ala Lys Leu Gin Leu 
— - 195 200 205 



(2) INFORMATION FOR SEQ ID NO:37; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 464 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
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GTTCCAAGCC taacccaxct ttgtcgtttg gaaattcggg ccagtctaaa agcagagcac 



60 



CTTCACTCTG ACATTTTCAT CCATCAGTTG CCACTTCCCA GAAGTCTGCA GAACTATTTG 



120 



CTCTATGAAG AGGTTTTAAG AATGAATGAG ATTCTAGAAC CAGCAGCTAA TCAGGATGGA 



180 



GAAACCAGCA AGGCCACCTG ACACAGGTCC TTTAATTCTG TTTAGTCACA AAAGACGGCT 



240 



TGTGTGACTG TTTGGATTTG GTGATCAAAT GTCCATGTTT ACAGTTGCTT TTCCCAGTTT 



300 



GTGTCTTTCC CAATATTGTG AACCTTATCC ATCTTGCCTT ACTCAGTTTT ATTTCTAGTG 



360 



CACTTTGTTG TGTATTATTT GTTTACCTGA CCATTTTCTA CTTTATTCTG CTAATAAACT 



420 



GTAATTCTGA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAA 



464 



(2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUJENCE CHARACTERISTICS: 

(A) LENGTH; 747 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DKA 

f^cir) SEQUENCE DESCRIPTION: SSQ ID NO: 38: 

GGGGATCGAA AGCGGGGGCT TCTGGGACGC AGCTCTGGAG ACGCGGCCTC GGACCAGCCA 60 

TTTCCGTGTA GAAGTGOCAG CACGGCAGAC TGGTCAAACA AATGGATTTT ACAGAGGCTT 12 C 

ACGCGGACAC GTGCTCTACA GTTGGACTTG CTOCCAGGGA AGGCAATGTT AAAGTCTTAA 180 

GGAAACTGCT CAAAAAOOGC CGAAGTGTCG ATGTTGCTGA TAACAGGOGA TGCATGCCAA 240 

TTCATGAAGC AGCTTATCAC AACTCTGTAG AATGTTTGCA AATGTTAATT AATGCAGATT 300 

CATCTGAAAA CTACATTAAG ATGAAGACCT TTGAAGGTTT CTGTGCTTTG CATCTCGCTG 360 

CAAGTCAAGG ACATTGGAAA ATCGTACAGA TTCTTTTAGA AGCTGGGGCA GATCCTAATG 420 

CAACTACTTT AGAAGAAACG ACACCATTGT TTTTAGCTGT TGAAAATGGA CAGATAGATG 480 
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TGTTAAGGCT GTTGCTTCAA CACGGAGCAA ATGTTAATCG ATCCCATTCT ATGTGTGGAT 



540 



GGAACTCCTT GCACCAGGCT TCTTTTCAGG AAAATGCTGA GATCATAAAA TTGCTTCTTA 



600 



GAAAAGGAGC AAACAAGGAA 1GCCAGGATG AOTTTOGAAT CACACCTTTA TTTGTGGCTG 



CTCAGTATGG CCAAGCTAGA AAGCTTTGAA GCATACTTAT TTCATCCGGG TGCAAATGTC 



720 



AATTCTCAAG CCTTGGACAA AGCTACC 



747 



(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(AJ LENGTH: 1018 base pairs 
(S) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39; 

CACAAATGGG ACCATACAAA AATCTTGGAC TTGTTAATAA CCACTTACTA ACCGGGACCT 60 

GTGACACTGG GCTAAACAAA GTAAGTCCCT GTTTACTCAG CAGTGTTTGG GGGACATOAA 120 

GGATTeeCTA GAAATATTAC TCCGGAATGG TCTACAGCCC AGACGCCCAG GCGTGCCTTG 160 

TTTTTGGATT CAGTTCTCCT GTGTGCATGG CTTTCCAAAA GGAGGTGGAG CTGTAGTTCT 240 

TTGGAATTGT GAACATTCTT TTGAAATATG GAGCCCAGAT AAATGAACTT CATTTGGCAT 300 

ACTGCCTGAA GTACGAGAAG TTTTCGATAT TTCGCTACTT TTTGAGGAAA GGTTGCTCAT 360 

TGGGACCATG GAACCATATA TATGAATTTG TAAATCATGC AATTAAAGCA CAAGCAAAAT 420 

ATAAGGAGTG GTTGCCACAT CTTCTGGTTG CTGGATTTGA CCCACTGATT CTACTGTGCA 480 

ATTCTTGGAT TGACTCAGTC AGCATTGACA CCCTTATCTT CACTTTGGAG TTTACTAATT 540 

GGAAGACACT TGCACCAGCT GTTGAAAGGA TGCTCTCTGC TCGTGCCTCA AACGCTTQGA 600 

TTCTACA(3CA ACATATTGCC CACTGTTCCA TCCCTGACCC ATcrrTGTCQ TTTCGAAATT €60 
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CGGTCCAGTC TAAAATCAGA ACGTCTACGG TCTGACAGTT ATATTAGTCA GCTGCCACTT 
CCCAGAAGCC TACATAATTA TTTGCTCTAT GAAGACGTTC TGAGGATGTA TGAAGTTCCA 
GAACTGGCAG CTATTCAAGA TGGATAAATC AGTGAAACTA CTTAACACAG CTAATTTTTT 
TCTCTGAAAA ATCATCCAGA CAAAAGAGCC ACAGAGTACA AGTTTTTATG ATTTTATAGT 
CAAAAGATGA TTATTGATTG TCAGATAGGT TAGGTTTTGG GGGGCCAGTA GTTCAGTGAG 
AATGTTTATG TTTACAACTA GCCTTCCCAG TAAAAAAAAA AAAAAAAAAA AAAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 97 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDKESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

CGGGGGeCTG GGACCTGGGG CGTAACCGTC TCTACCACGA CGGCAAGAAC CAGCCAAGTA 60 

AAACATACCC AGCCTTTCTG GAGCCGGACG AGACATTCAT TGTCCCTGAC TCCTTTTTCG 120 

TGGCCCTOGA CATGRATGAT GGGACCTTAA GTTTCATCGT GGATGGACAG TACATGGGAC 180 

TGGCTTTCCG GGGACTCAAG GGTAAAAAGC TGTATCCTGT AGTGAGTGCC GTCTGGGGCC 240 

ACTGTGAGAT CCGCATGCGC TACTTGAACG GACTTGATCC TGAGCCCCTG CCACTCATGG 300 

ACCTGTGCCG GCGTTCGGTG CGCCTAGCGC TGGGAAAAGA GCGCCTGGGT GCCATCCCCG 360 

CTCTGCCGCT ACCTGCCTCC CTCAAAGCCT ACCTCCTCTA CCAGTGATCC ACATCCCAGG 420 

ACCGCCATAC GACAGCCATC TGGTGCCAAR TCACTGAGCC CGTTGGGGTC CGCCGACCCC 480 

TGCGCCTGGG ATGGAAGCCC ACCTCAGCCA TGGGCAGACG TGCCCCCTCA TCCTACCGGC 540 
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TGCCTCTGCT GGGGGAACCT ATGCCAACGG ACTTCTCCCT TCCCAACACT GGCTGAAGCA SOQ 

GCAGCACCCA GGCCCTTCCC I'GAACCAGAT GCAGAGAATA AACTATGAAA ACCTCTCTCA 660 

GGCGCCTTCT GCTCTCAGGT GGAGTGGQCV GCCCCCCACT CTCTGCAGAG AGAGGCTACA 720 

CCCACCTGGG GOGTCCTGGG AGGTAAGACT AGTAGCAGGT GCCAGGGCTG ARTCCAAAAG 780 

CAGGAATGGC CAGGAKCAGG CCATACAGAT GAAGCTCAGG ATGTCACATA CCATGGACAM 840 

rGAGACAGAA CCCCAGGTTG GAMTTCCCTT GGGCCAACGA GTGCCAGCTT TAATGTCAGC 900 

TGCHGGTGCT CTGTGGCCTG TATTTATTCT TTAAACAGTA GCAAAGGCCA TTTATTTATT 960 

CCACTTAGAA AGGAAACCTT GGTGGGTGGY TTCCCTCGAT GTGCTTTCCC CCACCTCCCT 1020 

GGAATGTGTG TGCCACACCT GTCCTTGTCC CAGGCCAGGA CTGTGGCACA TGAGCTGGTG 1080 

TGCACAGATA CACGTATGTC GTCGTGCATG ACCCCTGACT AGTTCCTAAG TAGCCCTGCA 1140 

CCAAGCACCA GAGCAGACCC CAAGAGAGGC CCGTGCAAGT CCCCATGTCC CCAGGTCCCT 1200 

GCTTCTGTTG CCTTGGGACT CATACACCGG CACACGTCTT TCAGCCTCTT GACTTCCATG 1260 

AGCTTCGAAT TTTGCCCCCG ATTCTTCTGA TATTTCCCAT TGGCATCCTC CAAAGCTCTG 132.0 

GGCCTGGAGG GCATTAGGAC ACA1GGAATG AGTGGGGTCT CCAGCCCCTG GGAAAGCCAC 1380 

TGGCAAQGCA GGATTAGAAA GACCAAGAGC AGGGTGGGGC GCCATGAAGC CTGTATGCCT 1440 

CTCAGGCTCA AGACCCCGCC ACACACCCAC TCAAGCCTCA GAAGTGGTGT GTAGGGCAGC 1500 

CCCAGGAGAG GAATGCCTGT CCTAGCAGCA CGTACATGGA GCACCCCACA TGTGCTCCAG 1560 

CCCTCTGGCT GTTTCTCTTG CTCTAGAATC AACTCCCTAC ATTGGGAATG TAGCCATTTG 1620 

GTAGAGGACT TGCCTAGCCT GCAGGAAGCT CACGTTCCAT CCCCTGCACC AAGGAGAATC 1680 

AAAGCTCAGG AGGCTGAGGC AGGAGGATTG CTG^CAGTGG TGTACACAGG TCATGGCCAT 1740 

CCTGGGCTAT ATTAAACCTT GTCCTTTAAG AAAAAGAAAA GAAATCAACT TCCATTGAAT 1800 

CTGAGTTCTG CTCATTTCTG CACAGGTACA ATAGATGACT TKATTTGTTG AAAAATGKTT 1860 

AATATATTTA CMTATATATA TATTTGTAAG AAGCATT 1897 
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(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

Gly Gly Trp Asp Leu Gly Arg Asn Arg Leu Tyr His Asp Gly Lys Asn 
15 10 15 

Gin Pro Ser Lys Thr Tyr Pro Ala Phe Leu Glu Pro Asp Glu Thr Phe 
20 25 30 

lie Val Pro Asp Ser Phe Phe Val Ala Leu Asp Met Xaa Asp Gly Thr 
35 40 45 

Leu Ser Phe He Val Asp Gly Gin Tyr Met Gly Val Ala Phe Arg Gly 
50 55 60 

Leu Lys Gly Lys Lys Leu Tyr Pro Val Val Ser Ala Val Trp Gly His 
65 70 75 80 

Cys Glu He Arg Met Arg Tyr Leu Asn Gly Leu Asp Pro Glu Pro Leu 
85 90 95 

Pro Leu Met Asp Leu Cys Arg Arg Ser Val Arg Leu Ala Leu Gly Lys 
100 105 110 

Glu Arg Leu Gly Ala He Pro Ala Leu Pro Leu Pro Ala Ser Leu Lys 
115 120 125 

Ala Tyr Leu Leu Tyr Gin 
130 



(2) INFORMATION FOR SEQ ID NO: 42 



P.\OPER\EIH\SOC£l PRV - 31/i(V97 



- 165" 

(i) SEQUENCE CHARACTERISTICS; 

{A) LENGTH: 265 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY; linear 

{ii} MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42; 
AAGGGTAAAA AACTGTATCC TGTAGTGAGT GCCGTCTGGG GCCACTGTAG ATCCGAATGC 60 
GCTACTTGAA CGGACTCGAT CCCGAGACTG CCGCTCATGG ATTTGTGCCG TCGCTCGGTG 120 
CGCCTGGCCC TGGGGAGGGA GCGCCTGGGG GAGAACCACA CCTGCCGCTG CCGGCTTCCC 180 
TCAAGGCCTA CCTCCTCTAC CAGTGACGTT CGCCATCATA CCGCCAGCGC GACAGCCACC 240 
TGGTGCCAAC TCACTGAGCC GCCTG 

(2) INFORMATION FOR SEQ ID NO: 43; 

Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 243 8 base pairs 

(B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

AAGTGGCGGC GGTCCCTGGA GAGCAGGCGG AGGCAGCGGC AAGTCTGACT CTGGGCTGAC 60 

CGTGGAGCCG GGGCGGGGGC TGACAGCCAG GCCTCCGCCT GGCGCGAGCC GCACGAGGAG 120 

CGGGAGTGGC CGGGCCTCTC TTCCGCGCTT GAGCGAGCGC CGGGTGATGG CGGTGGTGAT 180 

GGCGGCAGGC GCTCGGACAG CTCCGCTTGA GCTGAGCTCG GAGAGATCCG TCCAGAAAGT 240 
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GCCCAGAAGA AACTTCCTCT TAGAAAAGCT GAAAAACACA RTATTTATAA CACTGGAAAT 300 

TGTAAAGAAT TTGTTTAAAA TGGCTGAAAA CAATAGTAAA AATGTAGATG TACGGCCTAA 360 

AACAAGTCGG AGTCGAAGTG CTGACAGGAA GGATGGTTAT GTGTGGAGTG GAAAGAAGTT 420 

GTCTTGGTCC AAAAAGAGTG AGAGTTGTTC TGAATCTGAA GCCATAGGTA CTGTTGAGAA 480 

TGTTOAAATT CCTCTAAGAA GCCAAGAAAG GCAGCTTAGC TGTTCGTCCA TTGAGTTGGA 540 

CTTAGATCAT TCCTGTGGGC ATAGATTTTT AGGCCGATCC CTTAAACAGA AACTGCAAGA 600 

TGCGGTGGGG CAGTGTTTTC CAATAAAGAA TTOTAGTGGC CGACACTCTC CAGGGCTTCC 660 

ATCTAAAAGA AAGATTCATA TCAGTGAACT CATGTTAGAT AAGTGCCCTT TCCCACCTCG 720 

CTCAGATTTA GCCTTTAGGT GGCATTTTAT TAAACGACAC ACTGTTCCTA TGAGTCCCAA 780 

CTCAGATGAA TGGGTGAGTG CAGACCTOTC TGAGAGGAAA CTGAGAGATG CTCAGCTGAA 840 

ACGAAGAAAC ACAGAAGATG ACATACCCTG TTTCTCACAT ACCAATGGCC AGCCTTGTGT 900 

CATAACTGCC AACAGTGCTT CGTGTACAGG TGGTCACATA ACTGGTTCTA TGATGAACTT 960 

GGTCACAAAC AACAGCATAG AAGACAGTGA CATGGATTCA GAGGATGAAA TTATAACGCT 1020 

GTGCACAAGC TCCAGAAAAA GGAATAAGCC CAGGTGGGAA ATGGAAGAGG AGATCCTGCA 1080 

GTTGGKGGCA CCTCCTAAGT TCCACACCCA GATCGACTAC GTCCACTGCC TTGTTCCAGA 1140 

CCTCCTTCAG ATCAGTAACA ATCCGTGCTA CTGGGGTGTC ATGGACAAAT ATCCAGCCGA 1200 

AGCTCTGCTG GAAGGAAAGC CAGAGGGCAC CTTTTTACTT CGAGATTCAG CGCAGGAAGA 1260 

TTATTTATTC TCTGTTAGTT TTAGACGCTA CAGTCGTTCT CTTCATGCTA GAATTGAGCA 1320 

GTGGAATCAT AACTTTAGCT TTGATGCCCA TGATCCTTGT GTCTTCCATT CTCCTGATAT 1380 

TACTGGGCTC CTGGAACACT ATAAGGACCC CAGTGCCTGT ATGTTCTTTG AGCCGCTCTT 1440 

GTCCACTGCC TTAATCCGGA CGTTCCCCTT TTCCTTGCAG CATATTTGCA GAACGGTTAT 1500 

TTGTAATTGT ACGACTTACG ATGGCATCGA TGCCCTTCCC ATTCCTTCGC CTATGAAATT 1560 

GTATCTGAAG GAATACCATT ATAAATCAAA AGTTAGGTTA CTCAGGATTG ATGTGCCAGA 162 0 



P:\OPERVEJ Husoes i .PR V - 3 1/ 1 tW7 

-167- 

GC AGCAGTGA TGCGGAGAGG TTAGAATGTC GACCTGCATA CATATTTTCA TTTAATATTT 1680 

TATTTTTCTT ATGCCTCTTT GAATTXTTGT ACAAAGGCAG TTGAATCAAA TAAAACTGTG 1740 

CCCTAAGTTT TAATTCCAGA TCAATTTATT TTTTTTATGA TACACTTGTT ATATATTTTT 1800 

AAGCAGGTGT TTGGTTTTGT TTTTACCATA tAAATTTACA TATGGTCCAG GCATATTTAC 1860 

AATTTCAAGG CATTOCATAT ACATTTGAAT ATTCTGTATT OTTTAAATAA TCTTTTOTTC 1920 

TTTCCTATGT GTGAAATATT TTGCTAATCT ATGCTATCAG TATTCTTGTA TGACCGAATA 1980 

GTTACCTATT CTCTTTTCAT CTTGAAGATT TTCAGTAAAG AGTGTTGTAA TCAATCCATT 2 040 

ATAATGTAAT TGACTTTTGT AATTTGCCAA TAGGAGTGTT AAACAACAAA ATGATTTAAA 2100 

ATGAAACTTA ATGTATTTTC ATTTTAAATA TTAACTAAAC CAAGTTTGTT TGTTAGTTAT 2160 

TCTAGCCAAT AAGAAAAGAG AATGTAGCAT CCTAGAGGTG TATTTGTTCT GCAGTTTGGC 2220 

AGGACCGTCA GTTAGTCCAA ATAAACATCC CCTCAGCGTG GAGGCGAATG GAACCTGTGC 2280 

TCCTTTCTTA CGGGAAGCTT TGCAAAGCAA AATAGCAGGG TTACAAGCTT OGAGTTGTTA 2340 

AGGCAACTAG AGTTTTCTCT ATTAATTTAT AGACTGTTGT TGCACCTACT TAGCTCTT7T 2400 

TTGGGAACTC TAGTTCCCAG GGG AAAATAC CTCGTGCC 2438 

(2) liffORMATION FOR SEQ ID NO: 44: 

(1) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 542 amino acids 
IB) TYPE: amino acid 
(C) STItANDEDNESS: single 
{DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION; SEQ 10 NO: 44: 



Ser Gly Gly Gly Pro Trp Arg Ala Gly Gly Gly Ser Gly Lys Ser Asp 
15 10 15 
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Ser Gly Leu Thr Val Giu Pro Gly Arg Gly Leu Thr Ala Arg Pro Pro 
20 25 30 

pro Gly Gly Ser Arg Thr Arg Ser Gly Ser Gly Arg Ala Ser Leu Pro 
35 40 45 

Arg Leu Ser Glu Arg Arg Val Met Ala Val Val Met Ala Ala Gly Ala 
50 55 SO 

Arg Thr Ala Pro Leu Glu Leu Ser Ser Glu Arg Ser Val Gin Lys Val 
65 70 75 80 

Pro Arg Arg Asn Phe Leu Leu Glu Lys Leu Lys Asn Thr Xaa Phe He 
85 90 95 

Thr Leu Glu lie Val Lys Asn Leu Phe Lys Met Ala Glu Asn Asn Ser 
100 105 110 

Lys Asn Val Asp Val Arg Pro Lys Thr Ser Arg Ser Arg Ser Ala Asp 
115 120 125 

Arg Lys Asp Gly Tyr Val Trp Ser Gly Lys Lys Leu Ser Trp ser Lys 
130 135 140 

Lys Ser Glu Ser Cys Ser Glu Ser Glu Ala He Gly Thr Val Glu Asn 
145 150 155 160 

Val Glu He Pro Leu Arg Ser Gin Glu Arg Gin Leu Ser Cys Ser Ser 
- 165 170 175 

He Glu Leu Asp Leu Asp His Ser Cys Gly His Arg Phe Leu Gly Arg 
180 IBS 190 

Ser Leu Lys Gin Lys Leu Gin Asp Ala val Gly Gin Cys Phe Pro He 
195 200 205 

Lys Asn Cys Ser Gly Arg His Ser Pro Gly Leu Pro Ser Lys Arg Lys 
210 215 220 

He His He Ser Glu Leu Met Leu Asp Lys Cys Pro Phe Pro Pro Arg 
225 230 235 240 



Ser Asp Leu Ala Phe Arg Trp His Phe He Lys Arg His Thr Val Pro 
245 250 255 
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Met Ser Pro Asn Ser Asp Glu Trp Val Ser Ala Asp Leu Ser Glu Arg 
260 265 270 

Lys Leu Arg Asp Ala Gin Leu Lys Arg Arg Asn Thr Glu Asp Asp lie 
275 280 265 

pro Cys Phe Ser His Thr Asn Gly Gin Pro Cys Val He Thr Ala Asn 
290 295 300 

Ser Ala Ser Cys Thr Gly Gly His He Thr Gly Ser Met Met Asn Leu 
305 310 315 320 

Val Thr Asn Asn Ser He Glu Asp Ser Asp Met Asp Ser Glu Asp Glu 
325 330 335 

He He Thr Leu Cys Thr Ser Ser Arg Lys Arg Asn Lys Pro Arg Trp 
340 345 350 

Glu Met Glu Glu Glu He Leu Gin Leu Glu Ala Pro Pro Lys Phe His 
355 360 365 

Thr Gin He Asp Tyr Val His Cys Leu Val Pro Asp Leu Leu Gin He 
370 375 380 

Ser Asn Asn Pro Cys Tyr Trp Gly Val Met Asp Lys Tyr Ala Ala Glu 
385 390 395 400 

Ala Leu Leu Glu Gly Lys Pro Glu Gly Thr Phe Leu Leu Arg Asp Ser 
. 405 410 415 

Ala Gin Glu Asp Tyr Leu Phe Ser Val Ser Phe Arg Arg Tyr Ser Arg 
420 425 430 

Ser Leu His Ala Arg He Glu Gin Trp Asn His Asn Phe Ser Phe Asp 
435 440 445 

Ala His Asp Pro Cys Val Phe His Ser Pro Asp He Thr Gly Leu Leu 
450 455 460 

Glu His Tyr Lys Asp Pro Ser Ala Cys Met Phe Phe Glu Pro Leu Leu 
465 470 475 480 

Ser Thr Pro Leu He Arg Thr Phe Pro Phe Ser Leu Gin His He Cys 
485 490 495 
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Arg Thr Val lie Cys Asn Cys Thr Thr Tyr Asp Gly lie Asp Ala Leu 
500 505 510 

Pro lie Pro Ser Pro Mec Lys Leu Tyr Leu Lys Glu Tyr His Tyr Lys 
515 520 525 

Ser Lys Val Arg Leu Leu Arg lie Asp Val Pro Glu Gin Gin 
530 535 540 

(2} INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4999 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 45: 

CCCTCTGGGC AAGCCGCCCC CCCCCCACCC ATCTACCACA CACACACACA CACACACACA €0 

CACACATTO^ GACCTTGGGG CAAAAACAAA GCAAAATAAC AACAACAAAA ACACTGCCTG 120 

TGGAAAGTCC TTACTTCAGG AAGGTTGGCA GATGAGGAGC AAGGGAACAT TTTATCAGGA 180 

CTGCCACAAA GGAGTCTTTT TTTTTAATGG TTTTTCAAGA CAGGGTTTCT CTGTATAGCC 240 

CTGGCTGTCC TGGAGCTCAC TTTGTAGACC AGGCTGGCCT CGAACTCAGA AATTCGCCTG 300 

CCTCTGCCTC CTGAGTGCTG GGATTAAAGG CGTGCAGCAC CATGTCCAAC TGGCATTTTC 360 

TCAATTAAGG TTCGTTCCTT TCAGATAACT CTAGGTTCTG GGTCAAGCTG ACACAAGGCT 420 

ACACAGCACA GTTTGTATGC CACATTCAGT TCAGAAGACA CCCAACCTCC CTGGAACTGG 480 

AACTTATGCA CATTTGTGAG CTTCCACTTG GGAGTGGGAA CCTGAACTGG GTCCTCTGCA 540 

AGAGCAGCCG TGCTCTTAAC TGCTGAGCCA TTTCAGCAGC CTCACATCAG AATTAAGTTA 600 
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GAAATTAGCCG GGTATGAATC ATACCCTTAG AATCCTAGCA TCTGAAAGCA GAGCTAAGAG 660 
AAACAGGGAT TCAAGACCAG CTCTTGGCTA CAGAGCCCGT CCTGTCCTAG GATGGGCTAC 720 

AAGAGACTAT TTCAAAGCCA TCCAAACAAC AATAACTACA ACAACAACAA GGTTAAAATT 780 
AGGCTGGGCA CAGGGTACAC ACCTTTAATG CCAACACTCA GGAGGCAGAG GCAGGCTGAT 840 

CAGTGTGAGT TTGAGTTCAA CGTGGTCTAC ATAGGGAGTT CTAGGCCAGC AGAGGTTACA 900 

GTCTCTCTCT CTCTCTCTCT CTCTCTCTCT CTCTCACACA CACACACACA CACACACACA 960 

CACACACACA CACACACGGT GGCATTATGG GATTTTTTTG GGATAAGGTT TCTCTGTCTA 1020 

GCCCTGGCAT AGATTCACTC TGTAGACTAG GCTAGCCTTG AACTCAGAGA TCCGCCTGCC 10 BO 

TCTGCCTCCC AAGTGCTGGG ATTATAGGTG TTGCACCACC ACTGCCCAGC CACTTTGGGA 1140 

TTTTTGAACT GTTATCAAGA GGCTTTCGAG GAGGTCAAAC TTCAACAGCA ACCTCTCCAT 1200 

GATAATGTAG CTAATGATCA AACGACACTC AAAACTTAAC CCTTAAAGCA CACATCCACC 1260 

AGACAGCGTG CCCACTCGTA GTTCCATTAC TCAGGAGGCT GAAGCAGGAG GATGAAGGAC 1320 

TAAGGCTTCA GCAACCTAGG GAGCCGCAGG GGACAGTAGT CTCAATCCCT ACATTCTCCT 0.380 

GAACACAGGA GCAGGAGTTC AGGAAGGGTG TCAAGGCCGC TTACTGATCT TAGGGCCTCA 1440 

GOAATGACTA GCTCAGGCAG AGAGAGCAAA GGTCTCCAGT GGAGAAGTCT ACACACACAC X500 

ACACACACAC ACACACACAC ACACACACAC AGAATCCAAG GCGATGACGT CATCAAAGGG 1560 

TTAATTCl^AG TCTGGGATGG GGGGGAGGGT GGGGCACGCA GCTGTCAGGT GGCTTTGGAA 1620 

AAATAAACTG CTGAAGAGTC TGACGCCAGG GAGTCCTGGG AGGGACAAGA GGTTACCCAC 16 BO 

TCAAAGAGTG TGCTCCACAA AGCATGCGCG CTTGTCCACG TCTGGAGTCG TCACTTATTT 1740 

TTTGCCTGGA TTGTTTGTAG CCGGTGGGTT CTCAAGGCGG TAAGTGGTGT GGCCGCCGTG 1800 

GTCTGGGAGC TGACGATAGG GTTAATCGTC CACAGAGCCC AGGGGCGGAG CGCGGGCGGG 1860 

CGTCCGCAGC CCCGCTGGAG CCGGAAGCAG TGGCTGGTCA GGGGCGCTTC TAGCCTTCCC 1920 

TATCTGTACT TCCACAGAGG TCTCTGCGAG CTAGGGGGAC AGTGAGGTGC GGGGTAGGGG 1980 
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CCCGGCGTTA GAGCCAGCAA GGGGACGGTT CACGGTAAGG TCTGAGGGAG AGAGAGCTCC 2040 

TGAGAAACTT GGGGGGCGCG ACACAGATAG GGTGAAAGCA GAGTGATAGA CCTGGGATGG 2100 

TTAGGGGACC AAGGGAAGAC CAGGCTGGTT GGCATACACC GGTGAACGGA TGGGAGTCCT 2160 

AGGGAAAGAT GATGCGCCTA ACAGTCCTTT CTGTCTCCAC ACCACTCCAG GGGACGATCC 2220 

GGAGCTCAAC TTTCAAAAGC GAGACGCCCC AGCAAGCCTG TTTTGAGAAG TTCTTCAGCG 2280 

GCTCTCCTCA TGGGCCAGAC GGCCCTGGCA AGGGGCAGCA QCAGCACCCC TACCTCGCAG 2340 

GCTCTGTACT CGGACTXCTC TCCTCCCGAG GGCTTGGAGG AGCTCCTGTC TGCTCCCCCT 2400 

CCTGACCTGG TTGCCCAACG GCACCACGGC TGGAACCCCA AGGATTGCTC CGAGAACATC 2460 

GATGTCAAGG AAGGGGGTCT GTGCTTTGAG CGGCGCCCTG TGGCCCAGAG CACTGATGGA 2520 

GTCCGGGGGA AACGGGGCTA TTCGAGAGGT CTGCACGCCT GGGAGATCAG CTGGCCCCTG 2580 

GAGCAAAGGG GCACACACGC CGTGGTGGGC GTCGCCACCG CCCTCGCCCC GCTGCAGGCT 2640 

GACCACTATG CGGCGCTTTT GGGCAGCAAC AGCGAGTCCT GGGGCTGGGA TATTGGGC6G 2700 

GGAAAATTGT ATCATCAGAG TAAGGGCCTC GAGGCCCCCC AGTATCCAGC TGGACCTCAG 27 60 

GGTGAGCAGC TAGTGGTGCC AGAGAGACTG CTGGTGGTTC TGGACATGGA GGAGGGGACT 2820 

CTTGGCPACT CTATTGGGGG CACGTACCTG GGACCAGCCT TCCGTGGACT GAAGGGGAGG 2880 

ACCCTCTATC CCTCTGTAAG TGCTGTTTGG GGCCAGTGCC AGGTCCGCAT CCGCTACATG 2940 

GGCGAAAGAA GAGGTGAGAT ACGGACTAGG TGTGGGGAGA TCACTACTCT TGGCAATGGT 3000 

TTGGGCTGGA AACTCATGGT TGGAGCACAG GAAGTAGGCT TCTTGTCACT TTGGCCTGTC 3060 

ACTTAGATGG CCTTGGATCT AGCTTCACTC CCAATCCCTA TTGGATGTGA TGCACAAATT 3120 

CAGAGCCTTT GGGTCTCCCT CAGCTGAGGT GGCGGTGGAA ATGGAGGAAG AAGGAAGGGT 3180 

GCCTGAGCAG GATCTCAAGT TCAAGGATGC CTGGAGTTGC TTACTTACCT TGTCTTCCTT 3240 

CTCTCTCCGC AGTGGAGGAA CCACAATCCC TTCTGCACCT GAGCCGCCTG TGTGTGCGCC 33 00 

ATGCTCTGGG GGACACCCGG CTGGGTCAAA TATCCACTCT GCCTTTGCCC CCTGCCATGA 3360 
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AGCGCTATCT GCTCTACAAA TGACCCAGTA GTACAGGGTG TGCTGGCACC CTACCGTGGG 3420 

GACAGGTGGA GAGGCACCCG CTGGCCTAGA CAACTTTAAA AAGCTGGTGA AGCTGGGGGG 3480 

GGGGGGCTGG ACCCCTTCAC CTCCCCTTCT CACAGGAGCA AGACATATAG AAATGATATT 3540 

AAACACCATG GCAGCCTGGG ACAAAGAGGT TTTTGAAGTA AAAAATGAGA TGTATTGTCA 3600 

CAACCTGTTT CATTATTGTT TTTTGTTTTG TTTTACACTC CCCCACCCCA GOCTAGAGCC 3 660 

CCATCACTGT CTTAAGGAAT TATGACAACC CACAAAGCTC AGGCCCAGGT GTTTATTTCC 3720 

CTTACATGTA GGATGGTTCA CAAACACAAT ACAGGGGCTT TGGCACCGTG GGGGAGGGGA 3780 

CTATCCCAGG CCTCTTAGGG TCTCATGTAT ACCGAATTCA GACCCGAAAG CTCTGAATTT 3840 

CTGCATCAGA CATCCAGTAG AACTTGGGAG TGAAGCTAGA GCCAAGGCCA TCTAAGTGAC 3900 

AGGCCAAAGT GACACGAAGC CCACTTCCTG TGCTCCAACC ATGAGTTTCC AGCCCAAACC 3960 

AATGGAAGGT GATTTCACTT GTCAGGGCCC AAAGGGACAG TCAGTTCTAC TCCCTCCCCT 4020 

CACTAGGAGC CACCTTGGTG ACAGTTGATT CTACCCACTG TAAGTGGTAA AGGGATTGGC 4080 

CTGGTCCCAA CCATAATAGG GCGGTGGAAA CGGCTCAGGA GGGTACAGCG TGGAXTAGGC 4140 

CACAAGATGG GGCAGATGAT GTCATCAGAA GCATGTGACC GGTGGGAGCA GTTACTAAAC 4200 

TTCTGGfiCAA CCTAGTCCAT GCTATGCAGG CAGGTAGAGG GATGGGCAGT GCTCATTGTT 4260 

TGGCATTGAT GATGTCCACA AATTCAGGCT TGAGAGATGC GCCACCCACA AGGAAGCCGT 4320 

CCACGTCAGG CTGGCTTGCC AGCTCTTTGC AGGTTGCTCC AGTCACAGAA CCTGTACCAG 4380 

GAACAAGAAG ACAGTTTGGT CAGGTCTATG ATCAGAACAC TTAAGCCCCA CCTCTCTGTG 4440 

CAAGGCAGCC TCAGTCTGTC TTAGCCCATT TCCGTCTTAG CTAGAGCCAA AGCCACTCAC 4500 

CTCCATAAAT GATCCGGGTG CTCTGAGCCA CCCCATCATT GACATTGGAT TTCAGCCATC 4560 

CCCGGAGCTT CTCGTGTACT TCCTGTGCCT AGAAGGAGGA GGCAGAGCTA CTAAGTAAGC 4620 

TCCTTCCTAT CTATCATTCA AGGAGTAAAA ACCACTGGTT CTCACATAGA GTTGAGTTTC 4680 

CAGAAAAGCC CCGGGACCAG AGAGTGGCAA GGCTCCAATC CCACCAGGCT TGGAATGAAC 4740 
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ATTTTTGGCA AAGTCACTCT CCTTGGTGAG TTTGGGGGCC CTCTGTCTCT AAAGGGGCTT 4800 

GGATGGGCTC CATAGCTGTG TGAGTCTGTT AAAGCCGGAC AGGCTGAGGA GCTCTGGGTA 4860 

GTTACCTGCT GAGGGGTTGC CGTCTTGCCA GTCCCAATGG CCCACACAGG TTCATAGGCC 4920 

AGGACCACCT TGCTCCAGTC TTTCACATTA TCTGTGGGGC AGAGAGGAGA GTGAGTAGGA 4980 

AGGAGCTGAC CCGCCAAGC 4999 
(2) INFOIOIATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 264 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Met Gly Gin Thr Ala Leu Ala Arg Gly Ser Ser Ser Thr Pro Thr Ser 
15 10 15 

Gin Ala Leu Tyr Ser Asp Phe Ser Pro Pro Glu Gly Leu Glu Glu Leu 
20 25 30 

Leu Ser Ala Pro Pro Pro Asp Leu Val Ala Gin Arg His His Gly Trp 
35 40 45 

Asn Pro Lys Asp Cys Ser Glu Asn He Asp Val Lys Glu Gly Gly Leu 
50 55 60 

Cys Phe Glu Arg Arg Pro Val Ala Gin Ser Thr Asp Gly Val Arg Gly 
65 70 75 eo 

Lys Arg Gly Tyr Ser Arg Gly Leu His Ala Trp Glu He Ser Trp Pro 
85 90 95 

Leu Glu Gin Arg Gly Thr His Ala Val Val Gly Val Ala Thr Ala Leu 
100 105 110 
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Ala Pro Leu Gin Ala Asp His Tyr Ala Ala Leu Leu Gly Ser Asn Ser 
115 120 125 

Glu Ser Trp Gly Trp Asp He Gly Arg Gly Lys Leu Tyr His Gin Ser 
130 135 140 

Lys Gly Leu Glu Ala Pro Gin Tyr Pro Ala Gly Pro Gin Gly Glu Gin 
145 ISO 155 160 

Leu Val Val Pro Glu Arg Leu Leu Val Val Leu Asp Ket Glu Glu Gly 
165 170 175 

Thr Leu Gly Tyr Ser He Oly Gly Thr Tyr Leu Gly Pro Ala Phe Arg 
180 1S5 190 

Gly Leu Lys Gly Arg Thr Leu Tyr Pro Ser Val Ser Ala Val Trp Gly 
195 200 205 

Gin Cys Gin Val Arg He Arg Tyr Mec Gly Glu Arg Arg Val Glu Glu 
210 215 220 

Pro Gin Ser Leu Leu His Leu Ser Arg Leu Cys Val Arg His Ala Leu 
225 230 235 240 

Gly Asp Thr Arg Leu Gly Gin He Ser Thr Leu Pro Leu Pro Pro Ala 
245 250 255 



Mec Lys Arg Tyr Leu Leu Tyr Lys 
260 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5615 base pairs 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNESS: single 
{D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
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GTACTTTCTT TATATCTCCA TAATTTTATT TACTATTACT ACATGATACA TTATTTTATA 60 

AAAGTCTTTG TAACCTCCTT AAGGATTCAC TGCTTAATCT CCAGTGCTTA GCACAMTCA 120 

TTAAATGCGA ACCAGAAACT CTTCCAAAXG TGTTACATCT ATAACCTCAT TGGATTCTCA 180 

CTACCAACCC CATGCAATAG ATACTAATGT GATCTCTGTC TTACAGAGGA AGAAACAGGC 240 

ACAGGGAGGT TCAGTAATTT GCCCAAGGTC ATACACACAC TGGCCTTCAG GTATTCATGC 300 

CCGGGGAGTC TGGTCCCACA GCTGGCATGT TTGCCATTAT ATTATATTGC CTCCTTATAG 3 60 

TGTCGCCACT CATTAAGCAC ATTGACAGCT ATGCTTGGTG AGTGACTACT ATGTACCCAG 420 

CTCTGTGCTA CATGCTTTAC CTGGATTATT TCAACTGCAC AACAACCCTG TGAGGTAACT 480 

ACCATCATTG CTCCTATTTT ACATAACAGA AAACTACAGA AATCTGGGGC TGGGCGTAGT 540 

GGCTCATGCC TGAAATCCCA GCACTTTGGG AGACCCTGTC TCTAAAAAAA ATTTTTTTTT 600 

GGCCGGACGT GGTGGCTCAC ACCTGTAATC TCAGCACTTT GGGAGGCTAA GGCAGGCAGA 660 

TCACAAGGTC AGGAGTTCTA GACCAGCCTG GCCAACATGG CAAAACCCTG TGTCTACTAA 720 

AAATACAAAA AATAGCTAGG CGTGGTGGCA GGTGCCTGTA ATCCCAGCTA CTCAGGAGGC 760 

TGAGGCAGGA GAATCCCCTG AACCTGGGAG ATGGAGGTTA CAGAGAGCCG AGATCGTGCC 840 

GCTGCACTCC AGCCTGGGCA ACAAGAGCAA GACTCTGTCT CGAAAAAAAT AAAAATAAAA 900 

ATAAAAATAT TTTTTTAAAA ATTAGCTGGG TGTGGTAGCA CATGCCTGTA GTCCCAGCTA 960 

CTTGGGAGGC TGAGGTAGGA GGATCACTTG AGCCCAGGAG GTCAAGGCTG CAGTGGGCTG 1020 

TGATGGCGCC ACTGCACTCT AGCCTTGGTG ACAGCAAGAC CCTGTCTCAA AAAAAAAAAA 1080 

AAGAGAAATC GGGCAACTTC CCCA2^ATCG CGCAGTTAAC TAGTGGCATA GCTTCACTCA 1140 

AACTCGAAGT CTTAATCAGG ACACTCTACC AAATGAGATC AACGGCTCAG TAATGGATTG 1200 

GCATCCAGTA TGAAGACTGG ACCAGCAGGG AGAACTATGA TGCGTACAGC CTAGAGCCTG 1260 

AAGCAGATTT CACAGCCTCA GAGGTGGCAC AGGCTGACTC ACAACCCGGG GCAGAAAGGG 1320 

ACCAGCCCAG AAACAGTGAC CCAGAATCAC AGGGAAGTAG AAATGGGATT CGGCACAATG 13 80 
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AAGCCCCTCC TTGACCCCAT GCTGCTTACC CTCAGGGGCG CAGGAGTTAG TCGCTCAGGC 1440 

GGCTCAAAGG TCTTGACGGT GGAGAACACC ATCCCCAGGG ATTCCCGACG CGGTGATGCC 1500 

ATCAAAGCGT TAATTCTGAG ATGGGCCTGC CCGGGTGCGG ACTCTGCCGC AGCAAGAGAA 1560 

GGGTTAACTG CCCCGGGCCT TCGCCGTGGG GGCGGGGCCT CGGGGAGGGT CACAGCCCGG 1620 

GACTGAGACC CGAGGTTAAC CGCCCGGGGT GGGCXCCACG GGGGCGGGGC ATGCTCTCCG 1680 

CGGCTGCTGC CGGTATAGAG CGGTAACTGC CCAGGAGGGG GCGGGGCCCC ACAGGGGCGT 1740 

GGCCTCGGAG CTGCACGGCC GTGGGCGGCG ATGAGAGGGT TAAGCCCCAG AGGGCCCTGG ISOO 

AGGGGCGGGG CCGCGGGACG GGCTCGGCCC AAGGGAGGAG CTGGGGGCGG AAGCGGCCGG 1860 

CGGTCTGCGC CCTGCGCGCC TCGGCTTCTT ^CCGCCCGGC TCCTTCAGAG GCCCGGCGAC 1920 

CTCCAGGGCT GGGAAGTCAA CCGAGGTTCG GGGGCAGCGG CGAGGGCTCC GGGCGAGTAA 1980 

GGGGGATGGT CCATGCTGAG GCCCAAATGG GGCGAACTCG CGAGAGTCTC TGGCGACCTG 2040 

GATCAGATGG GGCGAGGGCA GATGAAGGGC CCAGGAGCTT TGGOGCAGCG AGGAGGGAGG 2100 

AGCGGGCCCG TTGGCAAACT TGGGTGAAAG GATGGGGTAC CTGGGTGACG AGCCCCCGCC 2160 

AGGATTCTGC TCTTCACGCC CCTTTTCTCC CAGCTCCCTT CCAGGXCAAT CCAAACTGGA 2220 

GCTCAAGTTT CAGAAGAGAA AGACGCCCCA GCAAGCCTCT TTCGGGGAGT CCTCTAGCTC 2280 

CTCACCTCCA TGGGCCAGAC AGCTCTGGCA GGGGGCAGCA GCAGCACCCC CACGCCACAG 2340 

GCCCTGTACC CTGACCTCTC CTGTCCCGAG GGCTTGGAAG AGCTGCTGTC TGCACCCCCT 2400 

CCTGACCTGG GGGCCCAGCG GCGCCACGGT TGGAACCCCA AAGACTGTTC AGAGAACATC 2460 

GAGGTCAAGG AAGGAGGGTT GTACTTTGAG CGGCGGCCCG TGGCCCAGAG CACTGATGGG 2520 

GCCCGGGGTA AGAGGGCCTA TTCAAGGGGC CTGCACGCCT GGGAGATCAG CTGGCCCCTA 2580 

GAGCAGAGGG GCACGCATGC CGTGGTGGGC GTGGCCACGG CCCTCGCCCC GCTGCAGACT 2640 

GACCACTACG CGGCGCTGCT OGGCAGCAAC AGCGAGTCGT GGGGCTGGGA CATCGGGCGG 2700 

GGGAAGCTGT ACCATCAGAG CAAGGGGCCC GGAGCCCCCC AGTATCCAGC GGGAACTCAG 2760 
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GGTGAGCAGC TGGAGGTGCC AGAGAGACTG CTGGTGGTTC TGGACATGGA GGAGGGAACT 282 0 

CTGGGCTACG CTATTGGGGG CACCTACCTG GGGCCAGCAT TCCGCGGACT GAAGGGCAGG 2880 

ACCCTCTATC CGGCAGTAAG CGCTGTCTGG GGCCAGTGCC AGGTCCGCAT CCGCTACCTG 2940 

GGCGAAAGGA GAGGTGAGGC CTGGGGCAGA CGTGGGGAGA ACTTTCTGTC CCTGGTGGCA 3000 

GTGGTTTGGG ATGGAAACTC TTCTGACAAG AGCAGAGGGG ATGGACCTTC ATCCAGCCTG 3 0 60 

CCTCAACCTC TGTTCAGTGC TGGGAAAGGC TAGGGGTCTT CACAGCTGTT ATTTAATTTA 3120 

ACCCAACAGC AATAGAGGTG AAACAGGCTT GAGAAAGCAA CTTTCTCAAG TTCTCTTGGC 3180 

CAGTAAATGG TGAACCTTCA GAATGGAGGG AGGAACTGCA GGGATGAGAG AATTCAGGAG 3240 

ATATCAACCC CTGAGCAAGA GGTGCAAAGC GTTAGGTACT GGGTTTGATG TACAGGTCCA 3300 

AAAGAAGGAT GGGCAGAGCC AGGTACCCAG GCTGTATACC GGATTCCCTG GGCTCTAACC 3360 

TGTCTCTGTG CCACATACCT ACTTCCTTCC TCAGCCACAC CTCTGGATGG AGACACTGQG 3420 

GCCCTGGGCA CCAGGGAGGA GAGCAGTGGA GGAGGCAGOG CCTTAGGGTG GGGCAGCAGG 3480 

GGAGGAGCCT CCCCAGGAAC TGACTGGGTC CAGGGCTTGG AGCTGCTCTC TGCAGTTGTG 3540 

TGGGCTGTAG AGTGGAGGGC CATCCCTCCT CACCTCAGCC CCAGCTCCCA AGCCTCTGGA 3600 

GTCAAftSCCT GGGCCAGCTC CACCACTGTC AGAGCCACCT TGGCCTGTTG TTTAGAGGGC 3660 

CTTAGCCAGC TCTTCACCCC CAGCTCTGAC TAGGGATGTG TGAAATCTTA TCTGGGAGGC 372 0 

AGAACTTCCG GGTATCTCAA ATTCCCCXTT CAGCCAGGTG GGCACACTCG AAGCAGGAAA 37 80 

^ GCAGAAAGGC ATCTGAGTAG GACCCCGTAG TTTGAGGACA TCTGGCTGGT GGCTGCACCC 3840 

ATACTTACAT TCCCCTCCTT CTCTCTCCCA GCGGAGCCAC ACTCCCTTCT GCACCTGAGC 3900 

CGCCTGTGTG TGCGCCACAA CCTGGGGGAT ACCCGGCTCG GCCAGGTGTC TGCCCTGCCC 3960 

TTGCCCCCTG CCATGAAGCG CTACCTGCTC TACCAGTGAG CCCTGTGATA CCACAGACTG 40 20 

TGCTGAGGTC TTGCCACCAC CCCTCCCCTT GGGGAGGTGG GGAGGCACTG CTGGCCTAGA 4080 
CCAGCTGCTG AAAGCTGGTO AGGCTGAGCC CCTACCCCAA CCCAAGCTCT GCGGAAATCA 4140 
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ACAGCCCCAG AGCCACTTGG AGGGAGGAAG AAAGGGAGCC GGCGTTCAAG GCTATGACAG 4200 

TCTGCTACGC AAAACATTTT TTCAAGTAAA AATAGTAAGA GATGTTGTTA TAGAAACCTG 4260 

TTCTTGTTTT TTTTTTTTTC TTGCACAAAT GATCATTTAT ATAGCTGCCT CAAAAAGGAA 4320 

GATTATCTGG GCAAGTCCAG TGAAGGCAGA CAAACCACAA GACCTAGTGC CAGGTTTATT 4380 

CCCTCACATG GGTGGTTCAC ATACACACCA CAGAGGCACG GGCACCATGG GAGAGGGCAG 4440 

CACTCCTGCC TTCTGAGGGG ATCTTGGCCT CACGGTGTAA GAAGGGAGAG GATGGTTTCT 4500 

CTTCTGCCCT CACTAGGGCC TAGGGAACCC AGGAGCAAAT CCCACCACGC CTTCCAXCTC 4560 

TCAGCCAAGG AGAAGCCACC TTGGTGACGT TTAGTTCCAA CCATTATAGT AAGTGGAGAA 4620 

GGGATTGGCC TGGTCCCAAC CATTACAGGG TGAAGATATA AACAGTAAAG GAAGATACAG 4680 

TTTGGATGAG GCCACAGGAA GGAGCAGATG ACACCATCAG AAGCATATGC AGGGAAAGGG 4740 

CAGTTACTGG GCTTCTGGGC TGCTTAGTCC CTGGCTTGGC AGGAAGGGTA GGGAAGATGG 4800 

ATGGGGCTCA TTGTTTGGCA TTGATGATGT CCACGAATTC GGGCTTGAGG GAAGCACCAC 4860 

CCACAAGGAA GCCATCCACA TCAGGCTGGC TGGCCAGCTC CTTGCAGGTT GCCCCAGTCA 4920 

CAGAGCCTGG GAAGGGAGCA GAACAAGGGC TTGGTCAAGA ATGGGATGAG TCTGCCCCAT 4980 

CCCCAGGTCC ATGTCCGAGG GCTCAGTCTA GTCCTCAGCC CACTCCACCT CAGCCGGGAA 5040 

CCAAAGCCAC TCACCTCCAT AAATGATACG GGTGCTCTGA GCCACCGCAT CAGAGACGTT 5100 

GGACTTCAGC CATCCTCGGA GCTTCTCGTG TACTTCCTGG GCCTAGAACA AGAAGCTGGC 5160 

CTAAGTAAGA CCTTTTCTCC CTCTCTAAGA GGAAAAATCA CTGGCACCAG TGGACACTTA 5220 

GTGTGGTTTC TGACTGAGTC AGAGTACCAG GGCTCTGATC CAAGCCAGGC CCTGGACTGG 5280 

ATGCCCTTGG ACAAGTCACT GTCTCTGGGT TCAAGGTCTC TGTGTCTTTG AAATAAGGGG 5340 

TTGCCCCATG TGGGCTGTGT CTGTCCAAAC CTATTGAGGC AGGCTGGGAT GAGGGCAGGG 5400 

CTCCTGGGCC CGGTTACCTG TTGGGGTGTT GCAGTCTTGC CAGTACCAAT GGCCCACACA 5460 

GGCTCATAGG CCAGGACGAC CTTGCTCCAG TCCTTCACGT TATCTGCAGG GCAGAGATAC 5520 
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AGATGGAGGG AAGGGTGAAC AAGAAAGAGC TCTCCAGCCA GGTTCTCCGG AGTACGAAGA 
ACGGTGGCCT ACTGCCCCCT AGTGGACATT GGGGG 



(2) INFORMATION FOR SEQ ID NO: 43; 

(i) SSQUENCE CHARACTERISTICS ; 

(A) LENGTH: 2 63 amino acids 

(B) TYPE: ainino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 48; 

Met Gly Gin Thr Ala Leu Ala Gly Gly Ser Ser Ser Thr Pro Thr Pro 
15 10 15 

Gin Ala Leu Tyr Pro Asp Leu Ser Cys Pro Glu Gly Leu Glu Glu Leu 
20 25 30 

Leu Ser Ala Pro Pro Pro Asp Leu Gly Ala Gin Arg Arg His Gly Trp 
35 40 45 

Asn Pro Lys Asp Cys Ser Glu Asn He Glu Val Lys Glu Gly Gly Leu 
50 55 60 

Tyr Phe Glu Arg Arg Pro Val Ala Gin Ser Thr Asp Gly Ala Arg Gly 
65 70 75 80 

Lys Arg Gly Tyr Ser Arg Gly Leu His Ala Trp Glu He Ser Trp Pro 
85 90 95 

Leu Glu Gin Arg Gly Thr His Ala Val Val Gly Val Ala Thr Ala Leu 
100 105 110 

Ala Pro Leu Gin Thr Asp His Tyr Ala Ala Leu Leu Gly Ser Asn Ser 
115 120 125 

Glu Ser Trp Gly Trp Asp He Gly Arg Gly Lys Leu Tyr His Gin Ser 



5580 
5615 
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130 135 140 

Lys Gly Pro Gly Ala Pro Gin Tyr Pro Ala Gly Thr Gin Gly Glu Gin 
145 150 155 160 

Leu Glu Val Pro Glu Arg Leu Leu Val Val Leu Asp Met Glu Glu Gly 
165 170 175 

Thr Leu Gly Tyr Ala lie Gly Gly Thr Tyr Leu Gly Pro Ala Phe Arg 
180 185 190 

Gly Leu Lys Gly Arg Thr Leu Tyr Pro Ala Val Ser Ala Val Trp Gly 
195 200 205 

Gin Cys Gin Val Arg lie Arg Tyr Leu Gly Glu Arg Arg Ala Glu Pro 
210 215 220 

His Ser Leu Leu His Leu Ser Arg Leu cys val Arg His Asn Leu Gly 
225 230 235 240 

Asp Thr Arg Leu Gly Gin Val Ser Ala Leu Pro Leu Pro Pro Ala Met 
24S 250 255 

Lys Arg Tyr Leu Leu Tyr Gin 
260 

(2) INFORMATION FOR SEQ ID NO: 49: 

_LiJ SEQUENCE CHAJIACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 
to STRANDEDNESS: Single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
AGCTAGATCT GGACCCTACA ATGGCAGC 

(2) INFORMATION FOR SEQ ID NO: 50: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH; base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: SO 
AGCTAGATCT GCCATCCTAC TCGAGGGGCC AGCTGG 
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CLAIMS: 

1 . A nucleic acid molecule coii5)ri$ing a sequence of nucleotides encoding or complementary 
to a sequence encoding a protein or a derivative, homoiogue, analogue or mimetic thereof or a 
nucleotide sequence capable of hybridizing thereto under low stringency conditions at 42''C 
wherein said protein comprises a SOCS box in its C-terminal region. 

2. A nucleic acid molecule according to claim 1 wherein the protein further comprises a 
protein; molecule interacting region. 

3. A nucleic acid molecule according to claim 1 wherein the protein: molecule interacting 
region is located in a region N-terminal of the SOCS box. 

4. A nucleic acid molecule according to claim 2 or 3 wherein the protein;molecule 
interacting region is a protein:DNA binding region or a protein:protein binding region. 

5. A nucleic acid molecule according to claim 4 wherein the proteinimolecule interacting 
region is one or more of an SH2 domain, WD-40 repeats or ankyrin repeats. 

6. A nucleic acid molecule according to any one of claims 1-5 wherein the SOCS box 
comprises the amino acid sequence: 

Xj Xj X3 X4 X5 X^ X7 Xg X9 Xjo Xn X12 Xi3 Xi4Xi5 X16 [Xj]^ Xjg X|9 X20 

Xji X22 X23 [Xjlft X24 X25 X26 X27X2g 

wherein: Xj is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P,TorS; 

X, is L, I,V, M, AorP; 

Xj is any amino acid; 

X5 is any amino acid; 
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X,isL,I,V,M, A,F,YorW; 
Xg is C,TorS; 

is R,KorH; 
Xio is any amino acid; 
Xj 1 is any amino acid; 
Xi2 is L, I, V, M, A or P; 
Xi3 is any amino acid; 
X[4 is any amino acid; 
Xis is any amino acid; 
Xi6 is L, I, V, A. P, G, C, T or S; 

PCJ^ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence X^ may comprise the same or different amino 

acids selected from any amino acid residue; 

Xi^is L, I, V, M, AorP; 

Xi8 is any amino acid; 

Xi9 is any amino acid; 

X^L,I,V,M,AorP; 

Xjj isP; 

X^jisLJ, V,M, A,PorG; 
X23 IS P or N; 

[Xjln is a sequence of n anaino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X^isL, I, V, M, AorP; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I, V, M, A or P, 



7. A nucleic acid molecule according to claim 6 wherein the protein modulates signal 
transduction. 
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8. A nucleic acid molecule according to claim 7 wherein the signal transduction is modulated 
by a cytokine or a hormone, a microbe or a microbial product, a parasite, an antigen or other 
effector molecule. 

9. A nucleic acid molecule according to claim 8 wherein the protein modulates cytokine- 
mediated signal transduction. 

10. A nucleic add molecule according to claim 9 wherein the signal transduction is mediated 
by one or more of the cytokines EPO. TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, XL-?, 11^13, IL- 
6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M-CSE 

11. A nuclek acid molecule according to claim 10 wherein the signal transduction is mediated 
by one or more of IL-6, LIF, OSM, IFN-y and/or thrombopoietin. 

12. A nucleic acid molecule according to claim 1 1 wherein the signal transduction is mediated 
by IL-6. 

13. A nudek acid molecule according to claim 1 wherein the nucleotide sequence encodes an 
amino acid sequence substanUally as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO, 8, 
SEQ ID NO. 10, SEQ ID NO, 12, SEQ ID NO, 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ ID 
NO, 25, SEQ ID NO. 29, SEQ ID NO, 36, SEQ ID NO, 41, SEQ ID NO. 44, SEQ ID NO. 46 
or SEQ ID NO, 48 or an amino acid sequence having at least about 15% similarity to all or part 
of the listed sequences or a nucleotide sequence which hybridizes to the nucleic acid molecule 
under low stringency conditions at 42°C, 

14. A nucleic acid molecule according to claim 1 wherein the nucleotide sequence is 
substantially as set forth in SEQ ID NO, 3, SEQ ED NO. 5, SEQ ID NO. 7, SEQ ID NO. 9, SEQ 
ID NO. 1 1, SEQ ID NO. 13, SEQ ID NO, 15, SEQ ID NO. 16, SEQ ID NO. 17, SEQ ID NO. 
20, SEQ m NO. 22, SEQ ID NO. 23, SEQ ID NO, 24, SEQ ID NO. 26, SEQ ID NO. 27, SEQ 
ID NO, 28, SEQ ID NO, 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID NO. 33, SEQ ID NO. 
34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, SEQ ID NO. 40, SEQ 
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ID NO- 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47 or a nucleotide sequence having 
at least 15% similarity to all or a part of the listed sequences or a nucleotide sequence capable of 
hybridizing to the listed sequences under low stringency conditions at 42 *C. 

15. A nucleic acid molecule comprising a sequence of nucleotides encoding or complementary 
to a sequence encoding a protein or a derivative, homologue, analogue or mimetic thereof or a 
nucleotide sequence capable of hybridizing thereto under low stringency conditions at 42^*0 
wherein said protein exhibits the following characteristics: 

(i) con^srises a SOCS box in its C-terminal region wherein said SOCS box comprises 
the amino acid sequence: 

Xi Xj X4 X5 X5 X7 Xg X9 XjQ X^ X|2 Xj3 Xj4 Xj5 Xig [XJrt Xi7 Xjg Xi9 X20 
X21 X22 X23 [Xj]„ X24 X25 X^fi X27X2J 

wherein; X^ is L, I, V, M, A or P; 

X2 is any amino acid residue; 

Xj is P. T or S; 

X, is L, I, V, M, AorP; 

X5 is any amino acid; 

X^ is any amino acid; 

X, is U I, V, M, A, F.YorW; 

Xg is CTorS; 

XjisR, KorH; 

X^Q is any amino acid; 

Xy is any amino acid; 

X,2 is U I. V, M, A or P; 

Xi3 is any amino acid; 

Xi4 is any amino acid; 

Xt5 is any amino acid; 

X;6 is L, I. V, M, A, P. G, C, T or S; 
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[XJft is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X, may comprise the same or different amino 

acids selected from any amino acid residue; 
Xi^is U I V, M,AorP; 
Xjg is any amino acid; 
Xj9 is any amino acid; 
X.oL, I, V, M.AorP; 
is P; 

X^^is L, I, V, M, A,PorG; 
X23 isPor N; 

[X)„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X2, is L, I V, M, AorP; 
X25 is any aniino acid; 

is any amino acid; 
X27 is Y or F; 

X25 is L, I, V, M, AorP; and 

(ii) comprises at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats 
or other proteitrmolecule interacting domain in a region N-terminal of the SOCS box; and 

(iii) modulates signal transduction. 

16. An isolated protein or a derivative, homologue or mimetic thereof comprising a SOCS box 
ia its C-terminal region, 

17. An isolated protein according to claim 16 wherein the protein further comprises a 
protcin:molecule interacting region. 

18. An isolated protein according to claim 17 wherein the protein:molecule interacting region 
is located in a region N-tciminal of the SOCS box. 
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19. An isolated protein according to claim 16 or 17 wherein the protein:molecule interacting 
region is a protein:DNA binding region or a protein:protein binding region, 

20. An isolated protein according to claim 19 wherein the protein:nGiolecule interacting region 
is one or more of an SH2 domain, WD-40 repeats or ankyrin repeats. 

21. An isolated protein according to any one of claims 16-20 wherein the SOCS box comprises 
the amino acid sequence: 

Xj X3 X4 X5 Xc; X7 Xg X9 XjQ Xji X^2 Xi3 X^^Xi^ Xjg [XJn Xp Xi8 X(9 X20 
X2J X22 X23 [Xj]n X24 X25 X26 X27X28 

wherein: X^ is L, I, V, M, A or P; 

Xj is any amino acid residue; 

X3 is P, T or S; 

X4is L, I, V, M, AorP; 

X5 is any amino acid; 

Xg is any amino acid; 

X7i$ L, I, V, M, A, F,YorW; 

Xg is C, T or S; 

X5 is R,KorH; 

Xio is any amino acid; 

X^ is any amino acid; 

X^2 is L, I, V, M, AorP; 

Xi3 is my amino acid; 

Xi4 is any amino acid; 

X|5 is any amino acid; 

Xie is L, I, V, M, A, P, G, T or S; 

[XJ^ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X^ may comprise the same or different amino 
acids selected from any amino acid residue; 



Pr\OPER\£lH\SOCSl PRV01/1(W 



-189- 

X^.is L, I, V, M, Aor P; 
Xjgis any amino acid; 

is any amino acid; 
X^L, I, V, M, Aor P; 

isP; 

Xi2 is U I, V. M, A,PorG; 
X23 is P or N; 

[Xjla is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X^^isL, I, V, M,Aor P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

Xjs is L, I, V, M, A or P. 

22. An isolated protein according to claim 21 wherein the protein modulates signal 
transduction. 

23. An isolated protein according to claim 22 wherein the signal transduction is modulated by 
a cytokine or other endogenous molecule, a hormone* a microbe or a microbial product, a parasite, 
an antigen or other effector molecule. 

24. An isolated protein according to claim 23 wherein the protein modulates cytokine- 
mediated signal transduction, 

25. An isolated protein according to claim 24 wherein the signal transduction is mediated by 
one or more of the cytokines EPO, TPO, G-CSF, GM-CSF, BL-S, IL-2, IL-4, IL-7, IL-13, IL-6, 
LIF, IL-12, IFNy> TNFa, IL-1 and/or M-CSF. 



26. An isolated protein according to claim 25 wherein the signal transduction is mediated by 
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one or more of IL-6, LIF, OSM, EFN- y and/or thrombopoietin. 

27. An isolated protein according to claim 26 wherein the signal transduction is mediated by 
IL-6. 

28. An isolated protein according to claim 16 wherein said protein comprises an amino acid 
sequence substantially as set forth in SEQ ID NO. 4, SEQ ED NO. 6, SEQ ID NO. 8, SEQ ID NO. 
10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ ID NO. 25. SEQ 
ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO, 46 or SEQ ID NO. 
48 or an amino acid sequence having at least about 15% similarity to all or part of the listed 
sequences. 

29. An isolated protein according to claim 16 wherein the said protein is encoded by a 
nucleotide sequence substantially as set forth in SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, 
SEQ ID NO. 9, SEQ ID NO. 1 1, SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO, 16, SEQ ID NO. 
17, SEQ ID NO. 20, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, SEQ 
ID NO. 27, SEQ ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID NO. 
33, SEQ ID NO. 34, SEQ ID NO. 35, SEQ ID NO, 37, SEQ ID NO. 38, SEQ ID NO. 39, SEQ 
ID NO. 40. SEQ ID NO. 42, SEQ ID NO. 43, SEQ ID NO, 45 or SEQ ID NO. 47 or a nucleotide 
sequence having at least 15% similarity to all or a part of the listed sequences or a nucleotide 
sequence capable of hybridizing to the listed sequences under low stringency conditions at 42 

30. An isolated protein or a derivative, homologue, analogue or mimetic thereof having the 
following characteristics: 

(i) comprises a SOCS box in its C-terminal region wherein said SOCS box comprises 
the anndno acid sequence: 

X3 X4 X5 Xg X7 Xg X^ Xio Xii X13 XJ4X15 Xi6 [XJn Xi7 Xjg Xi9 X20 

^22 -^23 [^jln ^24 X25 X27X28 



wherein: X| is L, I, V, M, A or P; 
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is any amino acid residue; 
X3 is P,TorS; 
X^is U I, V, M,AorP; 
X5 is any amino acid; 
X5 is any amino acid; 
X, is L, I, V, M, A,F. YorW; 
Xg is CTorS; 
X,i$R,KorH; 
Xto is any amino acid; 
Xi I is any an^o acid; 
Xi2 is L, I, V, M, Aor P; 
Xi3 is any amino acid; 
X^4 is any amino acid; 
Xi5 is any amino acid; 
Xj6 is L, I, V, M, A, P, C. T or S; 

pCJn is a sequence of n aniino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X^ may comprise the same or different amino 
acids selected from any amino acid residue; 
XnisL,I,V,M,AorP; 
Xig is any amino acid; 
Xi9 is any amino acid; 
X^L, I, V, M,AorP; 
X^i is P; 

X,, isU I, V,M, A,PorG; 
X23 is P or N; 

[Xj]^ is a sequence of n amino acids wherein n is from 1 to SO amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 

is L, I, V, M,AorP; 
X25 is any amino acid; 
X26 is any amino acid; 
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X^^is Yor F; 

X^g isL, I, V,M,AorP; and 

(ii) comprises at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats 
or other protein:molecuIe interacting domain in a region N-tenninal of the SOCS box; and 

(iii) modulates signal transduction- 

31. A method of modulating levels of a SOCS protein in a cell said method comprising 
contacting a cell containing a SOCS gene with an effective amount of a modulator of SOCS gene 
expression or SOCS protein activity for a time and under conditions sufficient to module levels 
of said SOCS protein. 

32. A irethod of modulating signal transduction in a cell containing a SOCS gene comprising 
contacting said cell with an effective amount of a modulator of SOCS gene expression or SOCS 
protein activity for a time sufficient to modulate signal transduction. 

33. A method of influencing interaction between cells wherein at least one cell carries a SOCS 
gene, said n^hod comprising contacting the cell carrying the SOCS gene widi an effective amount 
of a modulator of SOCS gene expression or SOCS protein activity for a time sufficient to modulate 
signal transduction, 

34. A method according to any one of claims 31-33 wherein signal transduction is mediated 
hy a cytokine, a hormone, a microbe or a microbial product, a parasite, an antigen or other effector 
molecule. 

35. A method according to claim 34 wherein the cytokine is one or more of EPO, TPO, G- 
CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL^12, IFNy. TNFa, EL-l and/or M- 
CSF, 

36. A method according to claim 35 wherein the cytokine is one or more of IL-6, LIF, OSM, 
IFN-y and/or thrombopoictin. 
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37. A method according to claim 36 wherein the cytokine is IL-6, 

38. A method according to any one of claims 31-37 wherein the SOCS gene encodes a protein 
having a SOCS box comprising the amino acid sequence: 

X3 X4 X5 Xg X, Xs X9 Xio Xu X,2 Xi3 X,4 Xi5 Xi« PCJ, X,, Xjj Xi9 X20 

X21 X22 X23 [Xjltt X24 X23 X26 X27X28 

wherein: X, is I, V, M, A or P; 

Xj is any amino acid residue; 

X3 is P, T or S; 

X,isL,I,V,M,AorP; 

X5 is any amino acid; 

Xg is any amino acid; 

X7 is L. I, V,M,A, F^YorW; 

Xft is C.TorS; 

Xj is R,KorH; 

is any amino acid; 
Xii is any amino acid; 
X,2is L, I,V, M,AorP; 
Xi3 is any amino acid; 
Xi4 is any amino acid; 
Xi5 is any amino acid; 
X16 is L, I, V, M, A, P, G, Q T or S; 

PQ^ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X^ may comprise the same or different amino 
acids selected from any amino acid residue; 
X^^is L, I, V, M, AorP; 
Xig is any amino acid; 
Xj9isany amino acid; 
XaoUl V,M, AorP; 
X21 is P; 
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Xjjis L, I, V, M, A,PorG; 
X23 is P or N; 

pCj]^ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X, may comprise the same or different amino 
acids selected from any amino acid residue; 
Xj^is L, I, V, M,AorP; 

is any amino acid; 
Xjg is any amino acid; 
Xj^is YorF;and 
X^g is L. I, V, M, A or P. 

39. A method according to claim 38 wherein the SOCS gene comprises a nucleotide sequence 
selected from SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7. SEQ ID NO. 9. SEQ ID NO. 11, 
SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID NO. 17. SEQ ID NO. 20, SEQ ID 
NO. 22. SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, SEQ ID NO. 27, SEQ ED NO. 28, 
SEQ ID NO. 30, SEQ ID NO. 31. SEQ ID NO. 32. SEQ ID NO. 33, SEQ ID NO. 34, SEQ ID 
NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39. SEQ ID NO. 40, SEQ ID NO. 42, 
SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47. 

40. A method according to claim 38 wherein the SOCS gene encodes a protein comprising an 
amino acid sequence substantially as set foith in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 8, 
SEQ ED NO. 10, SEQ ID NO. 12, SEQ ID NO. 14. SEQ ED NO. 18. SEQ ID NO. 21, SEQ ID 
NO. 25, SEQ ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO. 46 
or SEQ ID NO. 48. 
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ABSTRACT 

The present invention relates generally to therapeutic and diagnostic agents. More particularly, 
the present invention provides therapeutic molecules capable of modulating signal transduction 
such as but not limited to cytokine-mediated signal transduction. The molecules of the present 
invention are uscfid, therefore, in modulating cellular responsiveness to cytokines as well as other 
mediators of signal transduction such as endogenous or exogenous molecules, antigens, microbes 
and microbial products, viruses or components thereof, ions, hormones and parasites. 
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cgaatliccgggcgggctgtgtgagtctgtgagtggaaggcgcgccggctcttttgtctgagtgtgacccggtggc^^^ 
ccagqcattccggtgatttcctccgggcagtccgcagaagccgcagcggccgcccgcgctctctctgcagtctccacacc 

cgggagagcctgagcccgcgtcacgcccctcagcccccgctgagtcccttctctgttgtcgcgtccgaa^cgagtte^ 

gaa«agLggtgccccatagATGGCCAGCTTTCCCCC<aiGGG™CGAG2U^ 

GGAACTCTTGGCTCCAGCAGCTCCTTTTGACAAGAAATGTGGTGGTG^ 

CCTACTTTGCGTGGTCACWIGGATATCGCATAGTGAAGCTTGTCCCGTGGTCCCAGT(KCGT^ 

GGTTCCAAAAATGTTACCAATTCAAGCTGTCTAAAATTGGCAAGACAAAACAGTAATGGTGGTCA 

TGAGCACGTTATAGACTGTGGAGACATAGTCTGGAGTCTTGCTTTTGGGTCTTCAGOTCC^ 

TTAATATAGAATGGCATCGGTTCCGATTTGGACAGCaiTCAGCTACTCCTTGCCACAGGArrAA^ 

ATCTGGGATGTATATACAGCAAAACTCCTCCTTAATTTGGTAGACCACATTGAAATGGTTAGAGATTT^ 

AGATGGGAGCTTACTCCTTGTATCAGCTTCAAGAGACAAAACTCTAAGAGTGTGGGACCTGAA^ 

TGAAAGTATTGCGGGCACATCAGAATTGGGtGtACAGTTGTGCATTcTCTCCcGACTGTTCT^^ 

GCCAGTAAAGCAGTTTTcCTTTGGAATATGGATAAAtACACCATGATTAGGAAGCtGGAAGGTCAT^^ 

AGCTTGTGACTTTTCTCCTGATGGAGCATTGCTAgCTACTGCATCXTATGACaCTCGrGT^ 

ATGGAGACCTTCTGATGGAGTTTGGGCACCTGTTTCCCTCGCCCACTCCAATATTTC^^ 

GTGAGAGCTGTGTCTTTCAGTCATGATGGACTGCATGTTGCCAGCCTTGCTGATGATAAAAT^ 

CGATGAGGATTGTCCGGTACUU^GTTGCACCTTTGAGCAATGGTCTTTGCTGT^ 

CTGCTGGGACACATGATGGAAGTGTGTATTTTTGGGCCACTCCAAGGCAAGTCCCTAgCCTTC^ 

TCAATCCGAAGAGTGATGTCCACCCAAGAAGTCCAAAAACTGCCTGTTCCTTCCA^ 

CGGTTAGactgaagactgcctttcctggtaggcctgccagacagagcgccctttacaagacacacctcaAgctttacctc 
gtgccgaatt 
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MASFPPRWEKEIVRSRTXC^IJLAPAAPrDKKCGCTNim^aFAPDGSYrAWSQGYRIVK 
SSCLKIi2^QNSNGGQKNKPPEHVlDCGDIWSLAFGSSVPEKQSRC\^IEWHRJeTirGQ 
KLIiLNLVDHIEMVROLTFAPDGSI*LLVSASIU>KTIJlVWDIJKDDGNMVKV^^ 
WNMDKYTMIRKLEGHHaDWACDFSPDGAIJ^TASYDTRVYVWDPHNGD^^ 

H DGLH VASIiADDKMVHrWR I DE DCPVOVAPLSNGLCCAFSTDGSVLAAGTHDGSVYFWATPR QVPSLOHICRMSIRRVMS 
TOEVQKIiPVPSKILAFL SYRG* 
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il4.1 

CTGTCTTCCTCCGCAGCGCGAGGCTGGGTACAGGGTCTATTGTCTGTGGTTGACTCCGTACTTTGGTCTGAGGCCT^ 

GAGCTTTCCCGAGGCAGTTAGCAGAAGCCGCAGCGACCGCCCCCGCCCGTCTCCTCTGTCCCTGGGCCCGG^ 

TTGGCGTCACGCCCTCAGCGGTCGCCACTCTCTTCTCTGTTGTTGGGTCCGCATCGTATTCCCGGAATC^ 

CATAGATGGCCAGCTTTCCCCCGAGGCTCAACGAGAAAGAGATCGTGAGATCACGTACTATAGGTG^ 

GCAGCTCCTTTTGAC^GAAATGTGGTCGTGAAAATTGGACTGTTGCTTTTGCTCCAGATGGTTCATACT^ 

ACAAGGACATCGCACAGTAAAGCTTOTTCCGTGGTCCCAGTGCCTTCAGAACTG?TCTCTTGCATG<^ 

CCAATTCiUlGCAGTTTAAQATTGCCAAGACAAAATAGTGATGGTGGTCAGAAJ^^ 

GTGGAGATATAGTCTGGAGTCTTGCTTTTGGGTCATCAGTTCCAGAAAAACAGAGTCGCTGTGTAAATATAGAATGGCAT 

CGCTTCAGATTTGGACAAGATCAGCTACTTCTTGCTACAGGGTTGAACAATGGGCGTATC^^ 

AGGAAAC7CCTCCTTAACTTGGTAGATCATACTG2U^GTGGTCAGAGATTTAACTTTTGCTCCAG 

b4-2 

CTCTGTATGTCTGAATGAAGCTATAACATTTGCCTTTTTATTGCAGGTTTTCCTTTGGAAT^ 

TACGGAAACTAGiU^GGACATCACCATGATGTGGTAGCTTGTGACTTTTCTCCTGATGGAGCATTACTGGCTA 

TATGATACTCGAGTATATATCTGGGATCCACATAATGGAGACATTCTGATGGAATTTGG^^ 

TCCAATATTTGCTGGAGGAGCAAATGACCGGTGGGTACGATCTGTATCTTTTAGCCATGATGGACTC^ 

TTGCTGATGATAA?UiTGGTGAGGTTCTGGAGAATTGATGAGGATTATCCAGTGCAAGTTGCACC 

TGCTGTGCCTTCTCTACTGATGGCAGTGTTTTAGCTGCTGGGACACATGACGGAAGTGTGTATTTTTGGGC^ 

GCAGGTCCCTAGCCTGCAACATTTATGTCGCATGTCAATCCGAAGAGTGATGCCCACCCAAG 

TXCCTTCCAAQCTTTTGGAGTTTCTCTCGTATCGTATTTAG23AGATTCTGCCTTCCCTAGTAGTAGGG^ 

CACTTAACACAAACCTCAAGCTTTACTGACTTCyLATTATCTGTTTTTAAA^ 

tcttgtactgcattttgatcagttgagcttttaaaatattatttatagacaatagaagtatttctgar£3^ 

aaatttttttaaagatctaactgtgaaaacatacatacctgtacatatttagatataagctc^ 

cttttgcttttctgatttttagttctgacatgtatatattgcttcagtagagccacaatatgtatc 

ca3\ggaaattttaaattctgggacactgagttagatggtaaatactgacttacgaaagttgaattgggtg^ 

atcacctgmgtcagcagtttgagactagcctgocaaacatgaargaaaccctgtctctactai^^ 

aaO 
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cggcacgagccgggctccgtccggaiggaagcgaggct.gcgccgccggcccggcaggagcggaggacgggaingcgcgggcg 

gtcgcgGtcgccctgtcgctgactgcgctgccccggcccatccttgcctggccgcagg-tgccctggatgaggccgccgcg 

cgtgtcGcggccgct-gag-tgtcccccgcggtcgcccggcgcctgccctcaagcggccgcctctccttgcccgggtccccg 

tt-ttcccGcggcgcagtcc-tcctccggtgggcgcctccgcacctcggcgcaggcggcacggccctcgggccgggatggat 

ccgccgggaagaggaagacaagccggggcgttgagcccctgcgcacggtgccgccgcgcgtagtgggagcttactcgcag 

taggctctcgctcttctaatcakTGGATi^GTGGGGAAi^TGTGGAACAACTTAAAATACAC^ 

GCCACGAGGGAGGAAGCCGTiUi.TGAGiUVCGTGGAGATGAACCCC^ 

CTGGGAGAGGCAGCTCCCCACXIAAGAGAGCACTCCCTTAAGAGiyiAATGTTGC^ 

GACCTTTTCCAGGCGGAACCAAAACTGTGCCGCAGAGATCCCTCAAGTGGTTGAAA 

CgGGTGCCACCCCAGGJU^CGAGGCTTGCACGGAGAGACTCCTACTCGCGCXUiCGcCCCGTGGQ 

TCCTGTTCCACAAAGACCCAGAGTTCaLTTGGATACCGAGAAAAAGTTTGGTAGAACTCGAAGCGGCCOT 

GCGGCGCTATGGAGTCAGCTCCAO^GCAGGACaTGGACAGCGayrTCTAGCCGCGCGGTCGGGAGCCGCTC 

GGCTCCAGGACACGGTGGGTTTGTGTTTTCCCATGAGAACTTACAGCAAC^ 

AAAATACATCTTTCTGAATTAATGCTGGAGAAATGCCCTrarCCTGCTGGCTCG 

TAAACAGCATACCGCCCCTGTGAGCCCACACTCAACATIOTTTGATACATTTGATCCATCACTGGTGTCT^^ 
AAGAAGATAGGCTTCGCGAGAGAAGACGGCTTAGTATCGAAGAAGGGGTGGATcCcCcTCCCi^CGCAC^^ 
TTTGAAGCTACTGCACAGGTaLACCCATTGTATAAGCTGGGACCAWlGTTAGCTCCTGGGAT 
TGGTTCTGCAATTCCACAAGCsAATTGTGACTCAGAAGAGGATTCAACC^ 

AGCGCCAGGTGTCCGGGGACAGCCACGCGCACGTTAGCAGACAGGGAGCTTGGAAAGTTCATACG^ 

CACTGCCTCGTGCCAGATTTGCTTCAGATCaiCAGGGA&TCCCTGTTACTQGGGCGTGATGGACCG^ 

CCTTCTAGAAGGGAAACCGGAAGGCACGXTCTTGCTCAGGGACTCTGCAOiGGAGGACTACCTCTTCTCT 

GCCGCTACAACAGGTCTCTGCACGCCCGGATCGAGCAGTGGAACCACAACTTCAGCTTCGATGCCCAT^ 

TTTCACTCCTCCAC^^GTCACGGGGCTTCTCGAACACTATAAAGACCCCAGCTCTTG^ 

GATATCACTGAATAGAACTTTCCCrrcCAGCCTGCAGTATATCTGCCGCGCAGTGATCTGCA 

GGATTGACGGGcTCCCGCTACCGTCGATGTTACAGGATTTTTTAAAAGAGO'ATCATTATAAACAAAAAG 

TGGTTAGAACG^GArCCAGTCAAAGCAAAGTAi^tcctgtccccaaagggcactaact.aag^ 

mgaactgGacccatagg2-aggcagt:cagctgctaggat*ttcccacccagaatgggagctt.agtcattagcctctgcccta 
t-gggS^ccgc'tgtt.cctcagacaaaggt.gcc-tagggacagcaagat.ggcttgcaggtgtitcggtgggctgtgaeaac'tga 
gggaggcaactct.ggggcat:tt:gct:atgaagaattci.atttcttaccgaagaacaaattattaatattggatgggt.att-t 
caatagtgtgactaatgtttgaaattattttttctaagaa-tttttctataaccttcagaaaaagtagtgatgtttgtagt 
tiactat:aaatcaagGt:^t:.gaaag^t:caaaacaaacaagttaaat.aaaagactacct:tcc^tt:t.agagaaaacaaatgcaa 
gttttcccagccacaggcattgtgcactgttaatgttngcttgtUatcagctcctttGtcctcc 
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MDKVGKMWNNLKYRCQNLFSHEGGSRhrENVBMNPNRCPSVKEKSISLGEAAPQQESSPI^ 

NCAAEIPQVVEISIEKDSDSGATPGTIU*ARRDSVSRHAPWGGKKKHSCSTKTQSSLDra 

MQDMDSVSSiWlVGSRSLRQRIiQDTVGLCPPMRTYSKQSKPLFSNiaUCIHLSELMIiEKCPFPAGSDI^ 

SPeSTFFDTFDPSLVSTEDEEDRIiRERRKLSIESGVDPPPNAQIHTFEATAQVNPLYKLGPKIJ^ 

NCDSJSEDSTTLCLQSRRQKQRQVSGDSHAHVSRQGAWKVHTQIDYIHCLVPDIXQITGNPCYWGN^DRYEA^ 

GTFLIJ«>SAQEDYr.FSVSraRYNRSriHAKI£QWNaNFSFDAHDPCVFaSSXVTGLLEHYKDP 

PFSLQYICRAVICRCTTYDGlDGLPLPSMLQDFIJ aBYHYKQKVRVRWIJlRXPVKAK * 



FIGURE 21 



GATTiLAACAGCATACAGCTCCTGTGAGCCCACATTCAACATTTTTTGATACrTTTGATCCJ^ 

ATGAAGAAGATAGGCTTAGAGAGAGAAGGCGGCTTAGTATTGAAGAAGGGGTTGATCCCCCTCCC^^ 

ACATTTGAAGCTACTGCACAGGTTAATCCATTATWTAAACTGGGACCAAAATTAGCTCCTGGAATGACT 

GGACAGTTCTGCAATTCCACAAGCTAATTGTGACTCGGAAGAGGATACAACCACCCTGTGyTTGCAGT 

AGCAGCGTCAGATATCTGGAGACAGCCATACCCATGTTAGCAGACAGGGAGCTTGGAAAGTCCAC^ 

ATACACTGCTTCGTGCCTGATTTGCTTCAAATTACAGGGAATCCCTGTTACTGGGGAGTGATGGACCGTTATGAAGCAG^ 

AGCCCTTCTCGAAGGGAAACCTGAAGGCACGTTTTTGCTCAGGGACTCTGCGCAAGAGGACTACTTCTT^ 

TCCGCCGATACAACAGATCCCTGCATGCCCGAATTGAGCAGTGGAATCACAACTTTAG^ 

GTATTTCACTCCTCCACTGTAACGGGACTTTTAGAACATTATAAAGATCCCyiGTTCGTGCATGTTTTTTGAACCA^ 

TACTATATCACTAAATAGGACTTTCCCTTTTAGCCTGCAGTATATCTgTcGCGCGGTAATCTGCAGGTGCACT 

ATGGAATTGATGGGCTCCCTCTACCCTCAATGTTACAGGATTTTTTAAAAGAGTATCATTATAAACAAA2UiGT^ 

CGCTGGTTGGaACGAGAACCAGTCAAGGCAAAGTAAACTCTCCGGTCCCCAAAGGgTGTTAACTAGGTC 

GCATCAGACAGTACACCTATAGCAAGCACACGTAGCAGTGTTAGGCTTTTTCATACAGTATGT^ 

CTGTCAGAtGCTACCTGCTGTTACTTATTCAGATAAACATGGtGCCTATTGGAACAATAGcQGATAGA^ 

CAGTAAGACTACAAAAACATTTTGCCTATTTCGCTAACAGTTTGGTTTTTAATGGCTGTGGiJl^ 

TGGGGCATTTGTTATGAAGAAATG 



FIGURE 23A 



ggcacgaggcggtggtggcggcggcgggcgcggccacggcggggcgggcgcggaATGAAGGCCCACGGCCCTCKSGGG^ 
aggcgcccgccgcctggggcgggccgcgcgtcctcEe^goccggaga 

G<K;CGCCCCCACCAGTTCGACTGGJ^GTCiU^GCTGCGAGACCTGGA 

CTGGTCTCiUVGGACACTGCGTGGTa^GCTGGTCCCCTGGCCCTTAGAGGAACAGTTCATCCCT 

AGAGCCGAAGCAGCAAGAATGACCOUiAAGGACGGGGCAGTCTGAAGGAGAAGACGCTGGACTGT^^ 

GGGCTGGCCTTCAGCCCGTGGCCCTCTCCACCCAGCAGGAAACTCTGGGCACGTCACCA^ 

TTGCCTGATCCTGGCCACAGGTCTCAACGATGGGCAGATCAAGAOTTGGGAGGTACAGACAGGCCTCCTGCT 

TTTCTGGCCACCAAGACGTCGTGAGAGATCTGAGCTTCACGCCCAGCGGCAGTTTGATTTTGGTCTCTGCAT 

AAGACACTTCGAATTTGGGACCTGAATAAaCACGGTAAGCAGATCOVGGTGTTATCCGGCCATCTGC^ 

CTGCTCCATCTCCCCTGACTGTAGCATGCTGTGCTCTGCJIGCTGGGGAGAAGTCGGTCTTTCTGTGGAGCATGCGCT 

ACACACTAATCCGGAAACTAGAAGGCCACCAAAGCAGTGTTGTCTCCTGTGATTTCTCTCCTGATTC^ 

ACAGCTTCGTATGACACCAGTGTGATTATGTGGGACCCcTACACCGGCGcGAGGCTGAGGTCAC^ 

TGAACCCACCATGGATGACAGTGACGTCCACATGAGCTCCCTGAGGTCCGTGTGCTTCTCACCTGAAG 

CTACGGTGGCAGATGACAGGCTGCTCAGGATCTGGGCTCTGGA&CTGAAGGCTCCGGTTGCCTTTGCTCCGATGAC 

GGTCTTrTGCTGCACGTTCTTCCCACACGGTGGAATTATTGCCACAGGGACGAGAGATC^ 

TCCCCGGGTCCTGTCCTCACTGAAGCACTTATGCAGGAAAGCCCTCCGAAGTTTCCTGACAACGTATa^ 

TGCCAATCCCCAAGAAGATGAAAGAGTTCCTCACATACAGGACTOTCTAi^ 

cagcagtacaagggactggctaggatggagtcaggcagctcacactggaccagtgtggaccttccttcctcccatggcat 

gtgcaagtaggtctgcgtgaccccacttctgtggtgccggccttacct-cgtcttcatccgtggtgagcagccttcgtcag 

tctagtt.gtgtt.gaagccaagtgcagt:t.gtggatgttgctggggtaataaaggcaagcgggctccagagcctctctggtg 

gcggccaagccacactccc-ttaactgggaagtacctgccacgtagggcatttctgctgcctatttccagccagcggctgc 

atggtttgaagttcctccgttgtggtcagaagaactctggtgtttggttccctgctcagctgcgcgt.ggactgggctgag 

ctcctcaccatacactagtgccggcttttgttt.cctgtaaacag-tggttgca'tgtgtagagaagtaacaagcgagtattc 

agatcatacga3gaggcgtt:cctcggt:gcatgacggtcagatggccatttatcagcatatttatt:tgtat:1:t:t.ctcagca 

caiagtaaggtacaactgtg-ttt-tctcaattgtctcgaaaaaacagagttcttaagtggccc&gttg-tggagccafi.gtct 

aag1:cgtgtggagtcagtgGtgacatcactggcttgtgctg1:ctgtcacatgi:gtttgtctctgctgct:t^^ 

gatgtacccrccagttcaactgcccaaaacagacagccccttccaagcaccgttctttgacagcggtagcagctaccbat 

t:caagacgcct.cacacaaaatctgcctt.agaaagttaatatatt:t.taaatta-tt:t:taaaagaaactcaacatcttattct 

t:tggcctttct:taattgat:gctttatggaggcagt:g^taacat:tigt:acagtgtatgcatagaggagt;ct:cctctatttga 

agaacaatgcaaaatgaggctttcattgaagggaaaaaaaaaaaaaaaa 



FIGURE 23B 



MEAGEEPIJIJOi^IJCPGRPHQFDWKSSCETWSVAFSPDGSWrAWSQGHCVVKI.^ 
SLKEKTLDCGQIVWGIxZ^SPWPSPPSRKLWiUElHaPOAPDVSCLILATGLNDGQI^ 

PSGSLILVSASRDKTIJRIWDLNKHGKQIQVLSGHX*QWVTCCSISPDCSMLCSiUlGEXSVFLWSMRSYTLXR^^ 

SCDFSPDSALLVTASYDTSVIMWDPYTGARIJlSLHHTQLEPTMDDSDVHMSSIi^SVCrSPEGLYIAW^ 

KAPVAFAPMTNGLCCTFFPHGGIIATGTRDGBVQFWTAPRV LSSIJCBLCTK^^ 



FIGURE 24 



h6. 1 

GACACTGCATCGTCAiUVCTGATCCCCTGGCCGTTGGAGGAGCAGTTCATCCCTA;U^<^^ 

GCAAAAATGAGACGAAAGGGCGGGGCAGCCCMJUlGAGiUlGACGCTGGA 

GCCTGTGNCTTTCCCCACCCAGCAGGAAGCTCTGGGCACGCCACCACCCCCA^ 

CTACGGGACTCAACGATGGGCAGATCAAGATCTGGGAGGXGCAGACAGGGCTCCTGCTTTTGAATCT^^ 

ATGTCGTGAGAGATCTGAGCTTCACACCCAGTGGCAGTTTGATTTTGGTCTCCGCGTCACGGGATAAGACTCT 

GGGACCTGAATAAACACGGTAAACAGATTCAAGTGTTATCGGGCCACCTGCAGTGGGTTTACTGCT 

ACTGCAGCATGCTGTGCTCTGCAGCTGGAGAGAAGTCGGTCTTTCTATGGAGCATGAGGTCCTACACGTTA^ 

TAGAGOGCCATCAAAGCAGTGTTGTCTCTTGTGACTTCTCCCCCGACTCTGCCCTGCTTGTCACGGCTTCTT^ 

ATGTGATTATGTGGGACCCCTACACCGGCGAAAGGCTGAGGTCACTCCACCACACCCAGGTTGACCCCG^ 

GTGACGTCCACATTAGCTCACTGAGATCTGTGTGCTTCTCTCCAGAAGGCTTGTACCTTGC^ 

TCCTCAGGATCTGGGCCCTGGAACTGAAAACTCCCATTGCATlVrGCTCCTATGACCAATGGGCTTTGCTG 

CCACATGGTGGAGTCATTGCCACAGGGACAAGAGATGGCCACGTCCAGTTCTGGACAGCTCCTAGGOT 

AAGCACTTATGCCGGAAAGCCCTTCGAAGTTTCCTAAGAACTTACCaLAGTCCTAGCACTGCCAATCCCC^ 

GAGTTCCTCACATACAGGACTTTTTAAGCAACACCACATCTTGTGCTTCTTTGTAGC^ 

AGTTGCTGGAATAATGGGCCAAACATCTGGTCTTGCATTGAAATAGCATTTCTTTGGGATTGTGAAT 

CCAGATTCCAGTCTACTAGTCATGGATTTTTC 

h6-2 

ACCATGGTTCCAAGWTCCTCTCCYKCCTGTGGTCMRAAGTTGCYYCCGAATGTTGGGCCCAACT 

GCCTCCCCTTCTGACCTGCAGGACAGTTTTCCYGGAGCCCATTTGGTATGAGGTATTAAWTTAGCCT^ 

GGGACTCAGAGGCCGTGCTCCTGACCGATCCAGACACTATTTTTTTTTTTTTTTTTTAAaUVTG^ 

ATGACAAATTTGTATGTCAGATTATAGftAGGATGTATTCTTAAACCGCATGACTATTC^ 

GCCATTTATTAGCATCATATTTATTTGTATTTTCTCAACAGATGTTAAGGTAaWlCTGTCT 

CCATAGTACTTAAATTGAAAAAAAAAA 
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FIGURE 26A 



GG<:^CGiGGCGGGGTCAG<3GCGGAGGCTGAGGACCAAGTAGGCATGGCGGAGGGCGG<3ACCC^ 

GGCCGGGACCCGCAGGTCCTAATCTGAAGGAGTGGCTGAGGGAGCAGTTCTOTGACCATCCACTGGAG 

ACAAGACTCCATQATGCa.GCCTATGTAGGGGACCTCCAGACCCTCAGGAACCTACTGCAAGAGGAGA^ 

CATCAATGAGAAGTCTGTCTGGTGCTGCGGCTGGCTTCCCTGCACACCACTGAGGATCGCAGCCACT 

ACTGTGTGGACTTCCTCATACGCAAAGGGGCCGAGGTGGACCTGGTGGATGTCAAGGGGCAGACTG^ 

GTAGTGAACGGGCACTTGGAGAGCACTGAGATCCTTTTGGAAGCTGGTGCTGATCCCAACGGCAGCCGGCACC^ 

CACTCCTQTGTACCATGCCTYTCGTGTGGGTAGGGACGAGATCCTGAAGGCTCTTATCAGGTATGGGGCAGAXGTTC^ 

TCAACCATCATCTGAATTCTGACACCCGGCCCCCTTTTTCACGGCGGCTAACCTCCITGGTGGTCT 

AGTGCTGCCTACCATAACCTTCAGTGCTTCAGGCTGCTCTTGCAGGCTGGGGCAAATCCTGACTTC^ 

TGTCAACACCCAGGAGTTCTACAGGGGATCCCCTGGGTQTGTCATGGATGCTGTCCTGCGCC^^ 

TCGTGAGTCTGTTGGTAGAGTTTGGAGCCAACCTGAACCTGGTGAAGTGGGAATCCCTGGGCCCAGAGGC&AGAGG^ 

AGAAAGATGGATCCTGAGGCCTTGCAGGTCTTTAAAGAGGCCAG^^GTATTCCCAGGACCTTGCTGAGTTT 

GGCTGTGAGAAGAGCTCTTGGCAAATACCGACTGCATCTGGTTCCCTCGCTGCCGCTGCCAGACCC^ 

TGCTTTATGAGTAGcattcacatgcagtgctgactgcaatgrtggaagccgatcacctgcagtgaaaactgacacagactc 

tggcatcctgggaaccatggcctgtgctgccagcttgatccttggctgtcagtgaagaaaaaacggctgtgttctcttgg 

actgtgantctatctcaggtgcttgggccatcgaacgctccttgagtcattgtcaactgagaggcacatacaaacttaat 

tttgttcctcttcagtctctctgttttggattcttcctggcaatgtgtgcagcatgggctgagcctggtgattgccctag 

tggggaaggcttttttctccaggctatgcalictatttatgttcctactttgcaatttattgttcttttaaggcttgatat 

caaaacagaaagaggtttgttaagaaaagatatagggagaaaggaattccggttccgtgcacttgc-tagcctgctttcct 

tgcctgggtttgtctgtctatgctgcctggtgcacatcccttctctttgctgccactgttctattttgggagttgtcttc 

cgtctaagatggcttctggggttc-tatcttattgcacagaggtcccagaacagtgttcatagggcaccatctgctctgcc 

aagggttttctgatgtcttaccctggggatcttcagacagtggttacctttaggagacccacctggaactaaccattaag 

tgactgcccacattcagatcagggaccatcttaatagtactcactgccagtcctcacaagagaagatgacacgggtgctc 

tcttcagacactcccatacaggaagttggaaaat:gtcttggtcacctgggttgttcccaggctacaacttcttggtgti:c 

cactaafaccagratatcctagttttttgggttgactgttccctccccactttccttgaancccaatgcccntttgtktn 
ggttgcttccctaaaaktt ~ - 



FIGURE 26B 



AR<K7VRAEi^DQVGMAEGGTGPDGRAGPGPAGPNIJCEWLREQF 

RSRINEKSVWCCGWLPCTPIJil AATAGHGNC^rDFLIRKGAEVDriVDV^CGQTALVVAV^ STE ILIiEAGADPNGSRBH 
RSTPVYEAXRVGRDDILKALIRYGADVOVKaHIiNSDTRPPFSRiaTSLWCPLYI 
PVNTQgFYRGSPGCVMDAVIiRHGCEAArVSLLVEFGANLNLVKWESLGPEARGRRKMDPEALQV^ 
AVRRALGKYRLHLVPSLPLPDPIKK^L IiYE* 



FIGURE 27 



CXIATCCAXOXIGGAGGGCGGCAGGACGACGGGCGGGCAGGGCCGGGCT 
GAGCaATTTTGTGATCATCCGCTGGAGCACTCTGAGGACACGAGGCTCC^^ 

CCTCAGGAGCCTATTGCAAGAGGAGAGCTACCGGAGCCGCATCAACGAGAAGTCTGTCTGGTTC^ 

GCACACCGTTGCGAATCGCGGCCACTGCAGGCCATGGGAGCTGTGTGGACTTCCTCATCCGGAAGGGGGC^ 

CTGGTGGACGTAAAAGGACAGACGGCCCTGTATGTGGCTGTGGTGAACGGGCACCTAGAGAGTACCCAGATCCTTCTCG^ 

AGCTGGCGCGGACCCCAAC 
K7 2 

GAGGAAGAAGAAAAGTGGACCCTGAGGCCTTGCAGGTCTTTAAAGAGGCOUiAAGTGTTCCCAGAACCT 

TGCCGTGTGGCTGTGAGAAGAGCTCTTGGCAAAKACCGGCTTCATCTG&TTCCTTCGCTGCCTCTGCCAGACCCCA^ 

GAAGTTTCTACTCCATGAGTAGACTCCAAGTGCTGCGGTTGATTCCAGTGAGGGAGAAAGTGATCTGCA 

ACCGAGCCCTGAGTGCTGTGCTGCTGCTGGTCTCCTGATGGCTGTTGCTGCAGAA6ATGTCCTCGTAGACT 

CCTCAGGTGCCTGGGCCGCTGAACAGTCCTTGGGTCATTGTCAGCTGAGAGGCTTATACTAAAGTT 

CAAGTTCTCTGTTCTGGATTTTCAGTTGCATATTAATGTAACGGGCCATGGGQTATGTACATGTAGGGG^ 

GGCCTACTAATTTCCTGTAGGGAAGACTCCCAGCACTTCTGGAACTGTGCTTCT 

TGGTTCGATTAAAGCCTTCTAGTATCTCAATGAAAA 



FIGURE 28 




FIGURE 29 A 



CTGATGTCCGCAATTCTGAAGGTTGGACACCACTGCTGGCTGCCTGTGACATCCGCTGTCAATCCCC^ 

GCCACCACCAACCGCTGTTTTC^^CTGTGCCGCTTGCTGCTGTCTGTGGGGGCAGATGCTGATGAAT^ 

TTCAGCTTCCTGAGGAGGCaU^GGgCTTGGTGCCACCAGAGATTCTACAGAAGTACCATTO^ 

GCCTTGGTGAGGCAGCCCAGGTCGCTGCAGCATCTCTGCCGTTGTGCGCTCCGCAGTCACCTGGAGGGCTGTCTGCC^ 
TGCACTACCGCGCCTTGCCCTGCCACCGCGCATGCTCCGCTTTCTGCAGCTGGACTTTGAGGATCTGCTCTACTAGgctt 
gc-tgccctgtgaacaaagcagaccccacccccaccccaagggcatctctcagcaatgaatgatgcaaggcggtctgtctt 
caagtcaggagtggacgccttgatccacacttgagagaagaggccagatcagcaccyggctggtagtgatngcagagggc 
acctgtgcagatctgtgtgcgcactggaaatctctaggctgaaggcyagagcaaatggtgcargtgttagtccttgggan 
gagagacaganggtgagaaagcaagacagaggtgagagtgcacatgtcaagtggtagattgccttaaaagaaagctaaaa 
aaagaaaaagattcgggcgaacttctt-tagggg^aatgctgcagcgtgttaaactgactgaccagcgtccatatctttgg 
acccttcccggg-tgaaaaagccccttcatcctccagcgctccccaagggtgcttagcaataccgggtgcttttctgccgc 

aaagtgagttaccaaa 



FIGURE 29B 



. . - .MSAIi:JCVGHaCWI.PVTSAVMPQWaJtPPPXAVrNCAACCCI,WGQMI,MNTYRWC3I,PEEAKGLVPP^ 
SLFALVROPRSLQHI.CRCALRSHIiEGCLPHai.PRLPI.PPRMIiRFI.QrJ>FEDI.LY « 
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FIGURE 31 



GTGCSGGGCGTCATCATGACCTCCTCTAGC^TCTGCAACATGACTCCTGTGGTGCAAATCAACAAAT^ 
ATCCACAAGGATCTCTGGGCCTACA&CCAGGTCCTGGTCCACATGaCTGTCGTCTTCGGAGAAGGCACCACTCGCCCCCG 

GCAGGTACGGCTGACACCTCCAT<K3GAGAAGACGTATCCAa;CAGCAGCTGCGCGGCCCTTCiUiGA«KK^ 

ATCTAAAGGCACGGTGTACTGAAGGTAGTCCTGAGACATGAGTCCGATTACTACAGGCACGTGTTCCTCCAGGTGGAGGC 

TCAGGTCCCCGGGTGAGCTGGGGCTGCAGCGGGACTCAGGGCGCGGCTCTGGCTGCAGGTCTCGCAGCTCCCTGGGCTGT 

AGCTCCCGCAGATCCTTGCGCACACCqTTGACTGGT 



FIGURE 32 



TTAATACTACCTACATAGTAGAAAATTATAACTCCACTTTAAi^ACAATGTTTTCTOT 

TTATAAACATTAATGTTGCAAGAGAATCCAGTCCATTTATGAAAATTAGTTGACAATC^ 

CTAAGCTAAAGAAATCACAGATAAAACATTTTACCAAAAGGATAGGTAACACAC^^ 

GATCATCTAATATTTCTTTAATAATAATTCTAGTTCCATAGGTTTTCATGTTATGCCAATTTGTACCCGAGTTTAATTA 

GAAAAGGCAACAATTTCTAAATTGGTGGTATACATTTCTTTACAATTTTTTAATGTAAG^ 

CTAGAAGATGAAAACGAAGGCiUCAGAAAAATTCAACTTTTCACAACCAAAAGAATTAG 

AAAAAAGTGTTGTTAAAAGATATGTTGCAGATCTCCGTTCCATTACCCAAGATTATGTCAATTCACGATTCTAAAT^^ 

TTTTTAAAGTAAGAGATTAAAAACTCATCTTCAGTGTATATGTAAATTCCGTGGTTTTATGACACAGGTATOT 

CACTGKCTTTGGAAANTGGACCATTTAAAAGGACATGGCAATTTCCATTCTGTTAAGTTTCATTCA^ 

TTC^ATTACCACATGAAATGNTGCTTTTAATGC^^TAAAAATCACAGTGGATTAGCC^ 

CATTGAGGAGAATTTGATAATTCACATTGTGATTATTCTGCACATTGATGAAACATAATTCAQ^ 

CTTCGCTTTTTTAAAGAACCAAAATAAACCCAAGACACCTTGCTGACACTTCCCCACCCCTAAA 

TAGACATAAAACTGAAATAGTTATGGCAGCAAAAGATTTTGATGGCAATGAAAGTTTGTAAACT 

TCTTATTCCCAAAGTGCAAGATGCAGGGTTCTCAATCTTTCAGTAGTGCTTCTCCTGTAAATA^ 

CAAAGGCAGTTTCTGAATTAAGTCTATTCTGGTATACTGACGTATAACAAAACGACACAGGTACTGC^ 

SATGAACNCCCGRGAACACTGGSTTGGYCAAGTTCTNGAamGGKAAGKTGCAGATTCCa<^^ 

CAAAAAGCTCCCATTTTCAGAGTCCCTGATTGAATGCTCCAATTAGATCAACTATGGACOT 

GTTCATAAAAGCTAAACCTACCATTTGAGTGCTCAATTCTAGTGTGAAGTGTTTTACCATC^^ 

AAAGGTAACGGTCGTCAGAACTGTCCCGAACAAGAAAAGAACCATCXGGCACGTTTGCTAGCTTCCCTTCTGCCTC 

GTGTGATTGGTCCCCAGTACCaLTCCTTGCTTTGCAAGTTTTTTCAGCTCCTCTGTAAGGCTTGTCA 

TACTTTGCACTGAGTCATAAACTCTTGCAACCCCAGGAGCAGAGTTCGGATCAAAATTCAAATGACA^ 

AGCCACGTGGGGCTTTCTGTSCCAGTGAGTCCJ^CTGAAAGTTCCCCl^TTGGGATTTGGATTATTCCTGC^ 

CAATGGTGAAGATTGGAGGGACATCCATCGTGAACCCGCTCTCCGGGGTTCTGCAACATGACTC^ 

AAGCCATTCACCGGACTGATCaiCGAAGATCTCTGGGGCGACAACTAGGTCCTGGTCTACCTGACTCT 

AGCGCGCCCTCCCACTTGAGGAGGAACCGCAGAGACTTCCATGGGAGAAGAGCTGTCCAGACAATAGCTG 

CAAAGGATACATCCCCTCATCTAAAGGCACAGTATACTGAATGTAGTCCTGAGGCATAAGTCCAATAA^ 

TTCATCCAGGTGAAGATGCAGGTCTCCATTATGAGAAGCCGAGCTCTTCAGTGAATTGGCTT 

AGACTGGAGGTCGT 



FIGURE 34 



GGCACGAGGCTGTGTCCAGCACACAGAGAGGGCCCGGCCaTCTGCTTTGGTTCAGAGCCCTGTGTCTGTCTGTCA 

ACTCTTCCTCCCGGCTCGCAGCTCACCCTCCATCCTCCTTACTGGCTCCAGCAXGACTCGCTTCTCTT^ 

TTGCTCTGTTTCACTCTGGCTCTGCACCTTCCAGGTCCCCTTCGTCTCCCGAGAACCCACCGGCCCGCGCACCCCTGGGT 

CTGTTCCAAGGGGTCATGCAGAAGTATAGCAGCAACCTGTTCAAGACCTCCCAGATGGCGGCT^ 

GGCCATCAAGGAAGGGGATGAAGAGGCCTTGAAGATCATGATCCAGGATGGGAAGAATCTTGCA^^ 

GCTGGCTGCCGCTCCACGAGGCTGCCTACTATGGCCAGCTGGGCTGCCTGAAAGTCCTGCAGCAAGCCTACCC^ 

ATTGACCAACGCACACTGCAGGAAGAGACAGCATTATACCTGGCCACATGCAGAGAACACCT<^ 

GCTCCAGGCGGGGGCAGAGCCTGACATCTCTAACAAATCCAGGGAGACTCCACTTTACAAA 

CGGAGGCGGTGAGGATATTGGTGCGATACAACGCAGACGCCAACavCCGCTGTAACAGGGGCT 

TCTGTCTCCCGCAATGACCTGGAGGTCATGGAGATCCrrAGTGAGTGGCGGGGCCAAGGTGGAGGCC^ 

CATCACCCCTTTGTTTGTGGCTGCCCAG&GTGGGCAGCTGGAGGCCCrrGAGGTTCCTG^ 

ACACGCAGGCCAGTGACAGTGCATCAGCCCTCTACGAGGCCAGCAAGAATGAGCATGAAG^ 

TCTCAGGGCGCCGa.TGCTAACAAAGCC^iACAAGGACGGCCTGCTCCCCCTGCATGTT(^ 

AATAGTGCAGATGCTGCTGCCTGTGACCAGCCGCACGCGCGTGCGCCGTAGCGGCATCAGCCCGCTGCACCT 

AGCGCAACCi^CGACGCGGTGCTGGAGGCGCTGCTGGCCGCGCGCTTCGACGTGAACGCACCTCTGGCTCC 

CGCCTCTACGAGGACCGCCGCAGTTCTGCGCTCTACTTCGCTGTGGTCAACAACAATGTGTACGCG^ 

GCTGGCGGGCGCGGACCCCAACCGCGATGTCATa\GCCCTCTGCTCGTGGCCATCCGCCACGGCTGCCTGCG<^ 

AGCTGCTGTTGGACCATGGCGCCAACATCGACGCCTACATCGCCACTCJICCCCAC^ 

GCCATGAAGTGCCTGTCGTTACTCAAGTTCCTTATGGACCTCGGCTGCGATGGCGAGCCCTGCTTCTCCTGCCTOT 

CAACGGGCCGCACCACCCGCCCCGCGACCTGGCCGCTTCCaCGACGCACCCGTGGACGACAAGGC^ 

GTTCTGTGAGTTCCTGTCGGCCCCGGAAGTGAGCCGCTGGGCGGGACCCATCATCGATGTCCT 

ACGTGCAGCTGTGCTCCCGGCTGAAGGAGCACATCGACAGCTTTGAGGACTGGGCTGTCATC^ 

CCGAGACCTCTGGCTCACCTCTGCCGGCTGCGGGTTCGGAAGGCCATAGGAAAATACCGGATAAAACTCCTGGAC^ 

GCCGCTTCCCGGCAGGCTAATCAGATACTTGAAATATGAGAATACACaGTAAccagcctggagaggagatgtggcc^ 

gactgtttcc^ggacgccccaggtggcctgcatecaggaccccctggggtcagaacaggtgtgaccttgctggttctttg 

ctggagcttcacccaaagtgagaacctgatgtggggagtggacgtggaacc'tc-tgcttt.cacactgtcagcggatcgcag 

acccgctC'tgctt:ct,ggc<=at:agccagagacctt:caacctgrg9gccaggggagrigctggt;c-tgg'9caaggtggccc«ggc 

aggaatcctggccttaagctggagaacttgtaggaatccctcact-ggaccc-tcagctttcaggctgcgagggagacgecc 

agcccaagtattttat-ttcwcgtgacacaataacgttgtatcagaaaaaaaaaaaaacatgggcgcagcttattccttag 

tagggtatttacttgca-tgcngcgcttaaagcixtactggaaacatgcgt.tccnactat:gct:tgagaa-tccccttgcactg 

gtaaacgagagccgacgtgcttcaaggtiiiggatttttggnttgcccctttggcgttccgcggg-tttgn'tccgacngtaat 

tgaccccgt:gtti:tgt;cact.tt.cgagtgt.tccgactatt:ggggggctt:'tt:ggttgtccccaaaattgtgggtiggtgt:gcg 

gacgccacgagaagtggtitcatgggcgatiaatcattactgngagaatgtagagcggcggttttacgaataaatatttttt 

aagccgcct:t:cccaaaa 



FIGURE 35 



hlVA 

CCTCCTGAGAGTTCGCCGGCCCGGGCCCAATGGGnTTGTTCC^^GGGGTCATGCAOJL^ 

ACCTCCCAGCTGGCGCCTGCGGACCGCTTGATJWUiGGCC&Ta^ 

GGAAGGGAAG^y^TCTCGCAGAGCCCAACAAGGAGGGCTGGCTGCCGCTGC^ 

GCCTGAAAGTCCTGCAGCGAGCGTACCCAGGGACCATCGACCAGCGCACCCTGCAGGAGGAlUiCAGCCGTTO 
ACGTGCAGGGGCCACCTGGACTGOrCTCCTGTCACTGCTCCAAGCAGGGGCAGAGCGGGACATCTCCAA 
GAIVACCGCTCTACAAAGCCTGTGAGCGCAAGAACGCGGAAGCCGTGAAGATTCTTGGTG^ 
AACGCTGCAACCGGGCXG 



hlO.2 

GTGCAGCTCTGCTCGCGGCTGAAGG^U^CACATCGACAGCTTTGAGGACTGGGCCGTCATCAAGGAGAAGGCA 

AAGACCTCTGGCTCACCTTTGCCGACTGCGGGTTCGAAAGGCCATTGGGAAATACCGTATAAAACTCCTAG^ 

CGCTCCCAGGCAGGCTGATTAGATACCTGAAATACGAGAACACCCAGTAACTGGGGCCACGGGGAGAGAGGRGTAGCCCC 

TCAGACTCTTCTTACTAAGTCTCAGGACGTCGGTGTTCCCAACTCCAAGGGGACCTGGTGACAGAcGAG<^ 

CCTCCCTCIXrAGCCTGGACAGCTACCAGGATCTCACTGGGTCrCAGGGcCCAGAGC^ 

GTGTCAAGGAGAAGAATCATTTGTTTACAAACTGATGAGOVG&TCCCJ^GACCMCTCTACCTTC^ 

TCTATTCCTGGGGCCAGGGCAGAGCTTGAGGTGTTCTGGGGAAGGTGGTGCTCAGAGCCTTCCCTGTGC 

*i - ':tggaaaactcaccacttgacttcagagctttctctccaaagactaagatgaagacgtggcccaaggtaggggg 
gggagcctgggtcttggagog^:tttgttaagtattaatataataaatgttacacatgtgaaaa^ 



FIGURE 36A 



TTGGAGAAGTGTGGTTGGTATTGGGGGCCAATGAATTGGGAAGAl'GCAGAGATGAAGCTGAAAG^^ 

TTCCTGGTACGAGACAGTTCTGATCCTCGTTACATCCTGAGCCTCAGTTTCCGATCACAGGGT^^ 

ATGGAGCACTACAGAGGAACCTTCAGCCTGTGGTGTCATCCCAAGTTTGAGGACCGCTGTC^ 

AAGAGAGCCATTATGCACTCCAAGAATGGAAAGTTTCTCTATTTCTTAAGATCCAGGGTTCCAGGACT^ 

GTCCAGCTGCTCTATCCAGTGTCCCGATTCAGCAATGTCAAATCCCTCCAGCACCTTTGC^ 

GTCAGGATAGATCACATCCCAGATCTCCCACTGCCTAAACCTCTGATCTCTTATATCCGAAAGTTCTACTACTATGATCCT 

CAGGAAC^GGTATACCTGTCTCTAAAGGAAGCGCAGCGTCAGTTTCCAAACAGAAGCAAGAM 

GAGGGGCTCCCTGCTGGTCACCACCAAGGGCATTTGGTTGCCAAGCTCCAGCarrTGAj^gaaCcaaa^ 

aagaagaggaaaagtgagggaacaggaaggttgggattctctgtgcagagactttggttccccacgcaagccctggggctt 

ggaa9aagcacatgaccgtaGtctgcg-tg9ggctccacctcacacccacccctgggcatct.i:aggactggaggggctcclit 

ggaaaact^jgaagaagtctcaacac'tgtttctttt'tca 



FIGURE 36B 



LEKCGWYWGPMNWEDAOBMKIJCGKPDGSFLVRDSSDPRYXLSLSrRSQGITHaT^^ 

VEFlKRAIMHSKNGKFLYFIiRSRVPGLPPTFVQLLYPVSXlFSN VKSLQBLCRFRIRQLVRIDHIPDLPLPKPLrSYIR KFY 
YYDPOEEVXI-SIJCEAQRQFPNRSKRWNPPRSEGLPAGHHQGHLVAJKLQri* 



FIGURE 37 




FIGURE 38 




FIGURE 39 



GTTCCAAGCCTAACCCATCTTTGTCGTTTGGJWVATTCGGGCCAGTCTAAAAGOVGA^^ 
CCATCAGTTGCCACTTCCCAGAAGl'CTGCAGAACTATTTGCTCTATGAAGAGGTTTTAAGAATG?^ 
CAGCAGCTAATCAGGATGGAGAAACCAGCAAGGCCACCTGAc ac a gg tc c 1 1 1 aat 

tgtgtgactgtttggat.t.tggtga'tcaaatgtccatgtttacagttgcttttcccagtttgtgtct.ttcccaa-tattgtg 
aacc-ttat.ccatcttigccttactcagttttatttctagtgcactttgt'tgt.gtattatttgtttacctgaccattttcta 
ctt.tLa-ttctgctaat.aaactgtaat.'tct.gaaaaaaaaaaaaaaaaaaaaaadaaaaaaaaaaaa 



FIGURE 40 



GGGGATCGAAAGCGGGGGCTTCTG<K3ACGCAGCTCTGGAGACGCGGCCTCGCACCAGCC^^ 
CACGGCAGACTGGTCA^yvCAAATGGATTTTACAGAGGCTTACGCGGACAC^ 

AGGCAATCTTAAAGTCTTAAGGAAACTGCTCAAA?UGGGCCGAAGTGTCGATGTTGCTGATAACAGG<^ 
l^TCATGAAGCAGCTTATOVCAACTCTGTAGAATGTTTGGAAATGTTAATTAATGCAGAT 

ATGAAGACCOTTGAAGGTTTCTGTGCTTTGCATCTCGCTGCAAGTCAAGGACAa?TGGAAAATCGTACAGATTCTTTTAGA 

AGCTGGGCCAGATCC7AATGCiUi.CTACTTTAGAAGAAACGACACCATTGTTTTTAGCTGTTGAAAATGGACAGAT^ 

TGTTAAGGCTGTTGCTTCAACACGGAGCAAATGT^TJ^TGGATCCCATTCTATGTGTGGATGGiUlCTCCTTGi^ 

tcttttcaggaaaatgctgagatcataaaattgcttcto^agaaaaggagouuvcaaggaat 

cacaccti:tatttgtggctgctcagi'atggccaagctagaaagctttgaagcatacttatttcatccgg<?^ 

aaotgtcaagccttggacaaagctacc 

cacaaatgggaccatacaaaaatcttggnacttgttaataaccacttnactaaccgggacctgtga 
aaagtaagtccctgtttactcagngagtgtttgggggacat<^ggattgcctagnaaatattactccggaatggtct^ 

AGCCCAGNACGCCCAGGCGTGCCTTGTTTTTGGATTCAOTTCTCCTGTGTGCATGGCTTTCCAAAAGGAGOT 

RAGTTCTTTG<aU^Ta^GTGAACATTCTTTTGAAATATGGAGCCCAGATAAATGAACTTCATTTGGG^ 

CGAGAAGTTTTCGATATTTCGCTACTTTTTGAGGAAAGGTTGCTCATTGGGiACCATGGAACCATATATATGAAT^ 

ATCATGCAATTAAAGCACAAGCAAAATATAAGGAGTGGTTGCCACATCTTCTGGTTGCTGGAW 

CTGTGCAATTCTTGGATTGACTCAGTCAGCATTGACACCCTTATCTTCACTTTGGAGTTTACTAATTG<^ 

ACCAGCTGTTGAAAGGATGCTCTCTGCTCGTGCCTCAAACGCTTGGATTCTACAGCAACATATTG^ 

CTGACCCATCTTTGTCGTTTGGAAAT'TCGGTCCAGTCTAAAATCAGAACGTCTACGGTCTGACAGTTATAT 

GCCACTTCCCAGAAGCCTACATAATTATTTGCTCTATGAAGACGTTCTGAGGATGTATGAAGT^COVGAACTGG^ 

TTCAAGATGGATAAATCAGTGTUU^CTACTTAACACAGCTAATTTTTTTCTCTGAAAAA 

GAGTAGAAGTTTTTATGATTTTATAGTCAAAAGATGATTATTCATTGTCAGATAGGTTAGGTTTTGGGG<^ 
CAGTGAGAAMMTAtGTTTACAACTAGCCTTCCCAGTAAAAAAAAAAAAAAAAA^ 
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FIGURE 42A 



SSScgctgggaw^gagcgcctg<k3tgccatccccck;tc^ 

CCAGTGAtccacatcccaggaccgccatacgacagccatctggtgccaartcactgagcccgttggggtccgccgacccc 

tgcgcctgggatggaygcccaoctcagccatgggcagacgtgccccotcatcctaccggctgcctctgctgggggaacct 

a?g?caac;gactL;Iccttcccaacactggctgaagcagcagcacocaggcccttccctgaaccagatgcaga^^^ 

JactatgalLcctctctcaggcgccttctgctctcaggtggagtgggctgccccccactctctgcagagagaggctaca 

occacc?ggggggtcctgggaggtaagactagtaggaggtgccagggctgartccaaaagcaggaatggccaggamcag^ 

ccatacag^SagctcIggatgtcacataccatggacaintgagacagaaccccaggttggamttccctt^ 

gtgccagctttaatgtcagctgcmggtgctctgtggcctgtatttattctttaaacagtagcaaaggccatttatttatt 

ccacttagaaaggaaaccttggtgggtggyttccctcgatgtgctttcccccacctccctggaatgtgtgtgccacacct 

gtccttgtcccaggccaggactgtggcacatgagctggtgtgoacagatacacgtatgtcgtcgtgoatgacccctgact 

Igttcctaagtagccctgcaccaagcaccagagcagacoccaagagaggcccgtgcaagtccccatgtccccaggtccct 

qlttctgttgccttgggaotcatacaccggcacacgtgtttcagcctcttgacttccatgagcttcgaattttgcccccg 

attettctgatatttcccattggcatcctccaaagctctgggcctggagggcattaggacacatggaatgagtggggtct 

ccagcccctgggaaagccactggcaaggcaggattagaaagaccaagagcagggtggggcgccatgaagcotgtatgcct 

ctcaggctcLgaccccgccaoacacccactcaagcctcagaagtggtgtgtagggcagccccaggagaggaatgcctgt 

cctagcagcacgtacatggagcaccccacatgtgctccagccctctggctgtttctcttgctctagaatcaactccctac 

attgggaatgtagccatttggtagaggacttgcctagcctgcaggaagctcaogttccatcccctgcaccaaggagaatc 

aaagltcaggaggctgaggcaggaggattgotgtcagtggtgtacagaggtcatggccatcctgggctatattaaacctt 

gtcctttaagaaaaagaaaagaaatcaacttccattgaatctgagttctgctcatttctgcacaggtacaatagatgact 

tkatttgttgaaaaatgkttaatatatttacmtatatatatatttgtaagaagcatt 



FIGURE 42B 
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GGWDLGRKRLYHDGKNQPSKTYPATLEPDE'IFIVPDSFFVALDllXDGTLSFIVDGQYMGVAFRGLKGKKLYPVVSAV 

WGHCE IR«RYLNGl,DPEPLPLHDI.CRRSVKrALGKERLGAIPAI.PI.PASLKAyL.LYQ* 



FIGURE 43 



AAGGGTAAAAAACTGTATCCTGTAGTGAGTGCCGTCTGGGGCCACTGTNAGATCCGAATGCGCTACTTGAACGGACTCGAT 

CCCGAGACNTGCCGCTCATGGATTTGTGCCGTCGCTCGGTGCGCCTGGCCCTGGGOAGGGAGCGCCTGGGGGAGAACCANC 

NACCTGCCGCTGCCGGCTTCCCTCAAGGCCTACCTCCTCTACCAGTGACGTTCGCCATCATACCGCCAGCGCGACAGCCAC 
CTGGTGCCAACTCACTGAGCCGCCTG 
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FIGURE 45A 



i^GTGGCGGCGOTCCCTGGAGAGCAGGCGGAGGCAGCGGCAAGTCTGACTCTGGGCTGACCG^ 

GGGCTGACAGCCJ!lGGCCTCCGCCTGGCGGGAGCCGCACGAGGAGCGG<3AGTGGCCGG<XCTCTCTTCCGCG^ 

gcgccgggtgatggcggtggtgatggcggcaggcgctcggacagctccgcttgagctgagctcggagagatccgtcc^ 

aagtgcccagaagaaacttcctcttagaaaagctgaaaaacacartatttataacyvctgga 

aaaatcgctgaaaacaatagtaaaaatgtagatgtacggcctaaaacaagtcggagtcgaagtgctgac^ 

ttatgtgtggagtggaaagaagttgtcttggtccaaaaagagtgagagttgttctgaatctgaagccataggtactgttg 

agaatgttgaaattcctctaagaagccaagaaaggcagcttagctgttcgtccattgagttggacttagatcattcct^ 

gggcatagatttttaggccgatcccttaaacagaaactgcaagatgcggtggggcagtgttttccaataaag^ 

TGGCCGACACTCTCCAGGGCTTCCATCTAAAAGAAAGATTCATATCAGTGAACTO^TGTTAGATAAGTGCCCTTTCC^^ 

CTCGCTCAGATTTAGCCTTTAGGTGGCATTTTATTAAACGACACACTGTXCCTATGAGTCCCAACTCAGATGAATGGGTG 

AGTGCAGACCTGTCTGAGAGGAAACTGAGAGATGCTCAGCTGAAACGAAGAAACACAGAAGATGAC^ 

ACATACCAATGGCCAGCCTTGTGTCATAACTGCCAACAGTGCTTCGTGTACAGGTGGTCAC^ 

ACTTGGTCACAAACAACAGCATAGAAGACAGTGAaVTGGATT'CAGAGGA 

AAAAGGAATAAGCCCAGGTGGGAAATGGAAGAGGAGATCCTGCAGTTGGAGGCACCTCCXAAGTTCC^ 

CTACGTCCACTGCCTTGTTCCAGACCTCCTTCAGATCAGTAACAATCCGTGCTACTGGGGTGTCyVTGGAa^ 

CCGAAGCTCTGC-rGGAAGGAAAGCCAGAGGGCACCTTTTTACTTCGAGATTCAGCGCAGGAAGATTAT^ 

AGTTTTAGACGCXACAGTCGTTCTCTTCATGCTAGAATTGAGCAGTGGAATCATAACTTTAGCTTTGATGCCCAT^^ 

TTGTGTCTtCCATTCT^CCTGATATTACTGGGCTCCTGGAACACTATAAGGACCCCAGTGCCTGTATGTTCTTTGAGCCGC 

TCTTGTCCACTCCCTTAATCCGGACGTTCCCCTTTTCCTTGCAGCATATTTGCJiGAACGGTTATTTGTAATTGT^^ 

TACGATGGCATCGATGCCCTTCCCATTCCTTCGCCTATGAAATTGTATCTGAAGQAATACCAT 

GTTACTCAGGATTGATGTGCQ^GAGCAGCAGTGAtgcggagaggttagaatgtcgacctgcatacatattttcatttaa^ 
attttatttttcttatgcctctttgaatttttgtacaaaggcagttgaatcaaataaaactgtgccctaagttttaattc 
cagatcaatttattttttttatgatacacttgttatatatttttaagcaggtgtt-tggttttgtttttaccatataaatt 
tacatatggtccaggca-tatttacaatttcaaggcattgcatataca-tt-tgaatattctgtattttttaaataat.ctttt 
gttctttcctatgtgtgaaatattttgctaatctatgctatcagtattcttgtatgaccgaatagttacctattctcttt 
tcatcttgaagattttcagtaaagagtgttgtaatcaatccattataatgtaattgacttttgtaatttgccaataggag 
tgttaaacaacaaaatgatttaaaatgaaacttaatgtattttcattttaaatattaactaaaccaagtttgtttgttag 
ttattctagcc^ataagaaaagagaatgtagcatcctagaggtgtatttgttctgcagtttggcaggaccgtcagttagt 
ccaaataaacatcccctcagcgtggaggcgaatggaacctgtgctccHttcttacgggaagctttgcaaagcaaaatagc 
agggttacaagcttggagttgttaaggcaactagagttttctctattaatttatagactgttgttgcacctacttagctc 
ttttttgqcjaactctaqttcccagg^gaaaatacctcgtgcc 



FIGURE 45B 



SGGGPWRAGGGSGKSDSGLTVEPGRGLTARPPPGGSRTRSGSGRASLPIUiSEiyiVMAVV^^ 

VQKVPRRNFLIiEKLKHTXFITLBlVKNI/FKMAENNSKKVDVRPKTSRSRSADRKDGrWSQ^LSWSKKSESCSES^^ 

TVKNVBIPU^SQERQLSCSSIELDIiDHSCGaRFLGRSLKOKLQDAVGQCFPIKNCSGRaSPGLPSKRKIaISBL^tLD 

FPPRSDIAFRWHFIKRBIWMSPNSDEWVSADLSERKIJ^DAQLKRRNTBDDI 

MMNLWNNSIEDSDMDSEDEIITLCTSSRKRNKPRVmMEEEILQIiEAPPKrHTQIDYVHCLVPDLLQISNN^ 
YAAEALLEGKPEGTFrjJ^DSAQEDYLFSVSrRRVSRSLeARIEQWNaNFSFDAHDPCVFaSPDITGIiLEaTO 
EPLLSTPLIRTFPFSLQaiC^TVICNCTTYDGIDALPIPSPMKLyiiKEYHYKSKVRLIJlIDVP^ 



FIGURE 46 



rBESTs hESTs_ 



mBA C 

A 



299bp 
intron 



hBAC 



A 

918bp 
intron 



ccctct gggcaagccgccccccccccacccatctaccacaccicacacacacacacacacacaccicaticcag^cct tgggg 

caaaaacaaatgcdiiaatciacaac^acaaaaacac tgcc?:gcggaaagcccc tdct tcaggaaggc Lggcaga tgaggagc 

aagggaacaccctaccaggaccgccacaaaggagcccctccccccaacggccccccaagacagggc t Cctctgcatagcc 

ctggctgtcctggagcccact ttgtagaccaggctggcctcgaactcagaaat ccgcccgcccccgcct cccgagcgctg 

ggattaaaggcgtgcagcaccacgcccaactggcatccccccaac Caaggcccgcccct t tcagataactctaggttctg 

ggccaagccgacacaaggccacacagcacagcctgtacgccacattcagttcagaagacacccaacccccccggaaccgg 

aacccacgcacatttgtgagcttccacttgggagtgggaacccgaaccgggccccccgcaagagcagipcgtgctct taac 

cgccgagccatt tcagcagcctcacatcagaattaagttaaaaccagccgggcacgaaccacaccct cagaatcctagca 

CCtgaaagcagagctaagagaaacagggatCcaagaccagcccccggccacagagcccgccccgccctaggatgggctiac 

aagagactat t tcaaagccatccaaacaacaacaaccacaacaacaacaaggccaaaattaggctgggcacagggtacac 

acccttaatgccaacactcaggaggcagaggcaggctgaccagcgcgagcccgagcccaacgtggtctacacagggagcc 

ctaggccagcagaggttacagtcccc.ccccccccccccci;ctctctctci:ctctcacacacacacacacacacacacaca 

cacacacacacacacacggcggcatcabgggatthttttgggataaggtttccctgtccagccctggcatagattcactc 

tgtagactaggccagccttgaactcagagatccgcctgcccccgccccccaagcgctigggattataggtgttgcaccacc 

accgcccagccacfc ttgggatttt tgaactgctaccaagaggctttcgaggaggtcaaacc tcaacagcaacccccccac 

ga caa tg t age t aa t ga ccaaacgaca ctcaaaacttaaccctitaaagcacacatccaccagacagcgtgcccac teg t a 

gt tccattacccaggaggccgaagcaggaggatgaaggacCaaggcttcagcaacctagggagccgcaggggacagtagt 

cccaatccctacattctcccgaacacaggagcaggagttcaggaagggcgccaaggccgctcactgatcttagggcctca 

ggaatgactagcccaggcagagagagcaaaggtccccagtggagaagcccacacacacacacacacacacacacacacac 

acacacacacagaatccaaggcgacgacgtcatcaaagggccaaccccagcccgggatgggggggagggtggggcacgca 

gccgccaggtggctccggaaaaataaactgctgaagagcccgacgccagggagtcctgggagggacaagaggccacccac 

ccaaagagcgcgccccacaaagcatgcgcgcttgcccacgcctggagtcgtcacttattctccgcccggactccttgtag 

ccggcgggccctcaaggcggtaagtggtgtggccgccgcggcccgggaggtgacgacagggctaaccgcccacagagccc 

aggggcggagcgcgggcgggcgtccgcagccccgccggagccggaagcagtggcfcggccaggggcgcccccagccccccc 

tatctgtactcccacagaggcccccgcgagctagggggacagcgaggcgcggggtaggggcccggcgt tagagccagcaa 

ggggacggt tcacggcaaggtctgagggagagagagctcccgagaaacttggggggcgcgacacagatagggtgaaagca 

gagtgatagacctgggacggttaggggaccaagggaagaccaggccggttggcatacaccggtgaacgga cgggagtccc 

agggaaagatgacgcgcctaacagtcctttccgtctccacaccaccccaggggacgatccggagctcaactttcaaaagc 

gagacgccccagcaagcctgttttgagaagttcttcagcggctcCCCCCATG<3GCCAGACGGCCCTGGCAAGGGGCAGCA 

GCAGCACCCCTACCTCGCAGGCTCTGTACTCGGACTTCTCTCCTCCCGAGGGCTTGGAGGAGCTCCTGTCTGCTCCCCCT 

CCTGACCTGGTTGCCCAACGGCACCACGGCTGGA,^CCCCAAGGATTGCTCCGAGAACATCGATGTCAAGGAAGGGGGTCT 

GTGCTTTGAGCGGCGCCCTGTGGCCCAGAGCACTGATGGAGTCCGGGGGAAACGGGGCTATTCGAGAGGTCTGCACGCCT 

GGGAGATCAGCTGGCCCCTGGAGCAAAGGGGCACACACGCCGTGGTGGGCGTGGCCACCGCCCTCGCCCCGCTGCAGGCT 

GACCACTATGCGGCGCTTTTGGGCAGCAACAGCGAGTCCTGGGGCTGGGATATTGGGCGGGGAAAArTGTATCATCAGAG 

TAAGGGCCTCGAGGCCCCCCAGTATCCAGCTGGACCTCAGGGTGAGCAGCTAGTGGTGCCAGAGAGACTGCTGGTGGTTC 

TGGACATGGAGGAGGGGACTCTTGGCTACTCTATTGGGGGCACGTACCTGGGACCAGCCTTCCGTGGACTGAAGGGGAGG 

ACCCTCTATCCCTCTGTAAGTGCTCTTTGGGGCCAGTGCCAGGTCCGCATCCGCTACATGGGCGAAAGAAGAGgCgagat 

acggactaggtgtggggagatcaccactcttggcaatggtttgggctggaaactcatggccggagcacaggaagtaggct 

tcttgtcacctcggcccgtcacttagatggccttggatctagct tcactcccaacccccatcggatgtgacgcacaaatt 

cagagcctttgggtctccctcagccgaggcggcggtggaaatggaggaagaaggaagggcgcccgagcaggacctcaagt 

CCaaggaCgcctLSgagttgcttacttaccttgcctdcccccCCCCCCcgcagTGGAGGAACCACAATCCCTTCTGCACCT 

GAGCCGCCTGTGTGTGCGCCATGCTCTGGGGGACACCCGGCTGGGTCAAATATCCACTCTGCCTTTGCCCCCTGCCATGA 

AGCGCTATCTGCTCTACAAATGAcccagtagtacagggtgtgccggcaccccaccgtggggacaggtggagaggcacccg 

ctggcccagacaactttaaaaagctggtgaagctggggggggggggccggaccccttcacctccccttctcacaggagca 

agacatatagaaatgatattaaacaccatggcagcccgggacaaagaggtttttgaagtaaaaaatgagatgtattgtca 

caacctgt t tcactac cgct ccccgccccgc tttacactcccccaccccaggccagagccccaccaccgcct taaggaat 

tatgacaacccacaaagctcaggcccaggtgtttatttcccttacatgtaggatggtrcacaaacacaacacaggggcct 

tggcaccgtgggggaggggactatcccaggcctcttagggtctcatgtataccgaattcagacccgaaagccccgaattt 

ctgcaccagacatccagtagaacttgggagtgaagctagagccaaggccatctaagtgacaggccaaagtgacacgaagc 

ccacttcctgtgctccaaccatgagt ttccagcccaaaccaatggaaggcga^icccacccgccagggcccaaagggacag 

tcagttctactcccccccctcactaggagccacctcggcgacagccgaccccacccactgtaagtggtaaagggattggc 

ctggtcccaaccataatagggcggtggaaacggcccaggagggtacagcgtggattaggccacaagatggggcagatgat 

gtcatcagaagcacgtgaccggtgggagcagtcactaaacttctgggcaacctagtccatgctatgcaggcaggtagagg 

gatgggcagtgctcattgtCtggcattgatgatgtccacaaactcaggcttgagagatgcgccacccacaaggaagccgt 

ccacgtcaggctggcttgccagctctttgcaggttgctccagtcacagaacctgtaccaggaacaagaagacagcccggc 

caggtctatgaccagaacacccaagccccacccccccgtgcaaggcagcctcagtctgtcttagcccatttccgtcttag 

ccagagccaaagccactcacctccataaatgatccgggtgcc:ccgagccaccccat;cactgacattggatttcagccacc 

cccggagcttctcgtgcacctcccgcgcctagaaggaggaggcagagctactaagtaagctcctccctacccatcacfcca 

^gg^^taaaaaccactggttcccacatagagtcgagcctccagaaaagccccgggaccagagagcggcaaggccccaaCc 

ccacceggct tggaatgaacatttccggcaaagtcactccccccggcgagtttgggggccctctgtctctaaaggggctt 

ggatgggccccafcagctgtgtgagtctgttaaagccggacaggccgaggagctctgggtagttacccgccgaggggccgc 



FIGURE 47A 



FIGURE 47A (CONTINUED) 



gtcccgccagtcccaatggcccacacaggtccataggccaggaccaccttgctccagcccctcacaccacctgtggggc 
gagagga^gagtgagtaggaaggagctg4<^c:cgccaagc 



FIGURE 47B 



MOw^ALARGSSSTPTSQALYSDFSPPEGLEELLSAPPPDLVAQRHHGWNPKDCSENIDVKEGGLCFERRPVAQSTDGVRGK 
RGYSRGLHAWEISWPLEQRGTHAWGVATALAPLQADHVAALLGSNSESWGWDlGRGKLYKQSKGLEAPQrPAGPQGEQLV 

VPERLLWGDMEEGTLGYSIGGTYLGPAFRGLKGRTLYPSVSAVWGQCQVRIRYMGERRVEEEQSLiJt^^ 
r^LGOlSTLPLPPAI^KRYr.LYK 



FIGURE 48A 



y tact t tec CcatatctccAtaat tt tat ccaccat tactacacgatacat tacctcataaaagtctccgtaacctcccc 

aaggattcaccgcttaatctccagtgcttagcacaaatcatcaaatgcgaaccagaaactct tccaaacgtgttacacct 

acaaccccattggat tcccactaccaaccccatgcaatagacaccaacgtgatccccgtcttacagaggaagaaacaggc 

acagggaggttcagcaacttgcccaaggccatacacacaccggccctcaggtacccatgcccggggagcctggtcccaca 

gccggcatgtttgccactatattatattgcctcct tatagcgccggcactcatcaagcacattgacagccatgct tggtg 

agcgactactatgtacccagctctgtgccacatgctttacccggattatttcaaccgcacaacaaccctgcgaggtaacc 

accatcattgctcctaccccacataacag&aaactacagaaacccggggctgggcgcagtggctcatgcccgaaatccca 

gcaccc tgggagaccccgtccctaaaaaaaacctt tttttggccggacgtggtggcccacacctgtaatcccagcactcc 

999sggctaaggcaggcagatcacaaggccagaagttccagaccagcctggccaacatggcaaaaccccgcgtctactaa 

aaatacaaaaaatagctaggcgcggcggcaggtgcccgcaatcccagctacccaggaggctgaggcaggagaatcccccg 

aacctgggagacggaggttacagagagccgagatcgtgccgccgcactccagcctgggcaacaagagcaagaccctgtct 

cgaaaaaaataaaaacaaaaataaaaacactcctttaaaaatcagccgggtgtggtagcacacgcctgtagtcccagcna 

cttgggaggctgaggtaggaggatcacttgagcccaggaggtcaaggccgcagtgggctgtgacggcgccactgcactct 

agcctcggtgacagcaagaccctgtctcaaaaaaaaaaaaaagagaaatcgggcaacc tccccaagatcgcgcagttaac 

tagtggcatagctccactcaaacccgaagtct taaccaggacactctaccaaatgagatcaacggcccagtaatggaccg 

gcatccagcatgaagactggaccagcagggagaaccacgatgcgtacagcccagagcctgaagcagatcccacagcccca 

gaggtggcacaggctgactcacaacccggggcagaaagggaccagcccagaaacagtgacccagaatcacagggaagtag 

aaatgggattcggcacaacgaagcccctccccgaccccatgctccttaccctcaggggcgcaggagctagtcgctcaggc 

ggcccaaaggtcttgacggcggagaacaccacccccagggat tcccgacgcggtgatgccatcaaagcgt taattccgag 

acgggcctgcccgggtgcggactctgccgcagcaagagaagggtcaaccgccccgggcctccgccgtgggggcggggcct 

cggggagggtcacagcccgggactgagacccgaggttaaccgcccggggtgggctccacgggggcggggcatgcCctccg 

cggccgctgccggtatagagcggtaactgcccaggagggggcggggccccacaggggcgtggccccggagctgcacggcc 

gt^srggcggcgatgagagggctaagccccagagggccctggaggggcggggccgcgggacgggctcggcccaagggaggag 

ctgggggcggaagcggccggcggcctgcgccctgcgcgccccggcttctttccgcccggctccttcagaggcccggcgac 

ctccagggctgggaagtcaaccgaggttcgggggcagcggcgagggctccgggcgagtaagggggatggtccatgctgag 

gcccaaatggggcgaactcgcgagagcccctggcgacctggaccagatggggcgagggcagatgaagggcccaggagctt 

tggggcagcgaggagggaggagcgggcccgttggcaaacttgggtgaaaggacggggtacctgggcgacgagcccccgcc 

aggattctgcccctcacgccccccctcccccagctcccccccaggtcaatccaaaccggagctcaacccccagaagagaa 

agacgccccagcaagcctctttcggggagccctctagctccccacctccATGGGCCAGACAGCTCTGGCAGGGGGCAGCA 

GCAGCACCCCCACGCCACAGGCCCTGtACCCTGACCTCTCCTGTCCCGAGGGCTTGGAAGAOCTGCTGTCTGCACCCCCT 

CCTGACCTGGGGGCCCAGCGGCGCCACGGTTGGAACCCCAAAGACTGTTCAGAGAJ^CATCGAGGTCAAGGAAGGAGGGTT 

GTACTTTGAGCGGCGGCCCGTGGCCCAGAGCACTGATGGGGCCCGGGGTAAGAGGGGCTATTCAAGGGGCCTGCACGCCT 

GGGAGATCAGCTGGCCCCTAGAGCAGAGGGGCACGCATGCCGTGGTGGGCGTGGCCACGGCCCTCGCCCCGCTGCAGACT 

GACCACTACGCGGCGCTGCTCGGCAGCAACAGCGAGTCGTGGGGCTGGGACATCGGGCGGGGGAAGCTGTACCATCAGAG 

CAAGGGGCCCGGAGCCCCCCAGTATCCAGCGGGAACTCAGGGTGAGCAGCTGGAGGTGCCAGAGAGACTGCTGGTGGTTC 

TGGACATGGAGGAGGGAACTCTGGGCTACGCTATTGGGGGCACCTACCTGGGGCCAGCATTCCGCGGACTCAAGGGCAGG 

ACCCTCTATCCGGCAGTAAGCGCTGTCTGGGGCCAGTGCCAGGTCCGCATCCGCTACCTGGGCGAAAGGAGAGgCgaggc 

ctggggcagacgcggggagaacttcctgtccctggcggcagtggtttgggacggaaactcttccgacaagagcagagggg 

acggaccttcatccagcctgcctcaacctctgttcagcgctgggaaaggctaggggtcttcacagccgttatttaactca 

acccaacagcaatagaggcgaaacaggcctgagaaagcaaccctctcaagttctcctggccagtaaacggcgaaccctca 

gaacggagggaggaaccgcagggatgagagaattcaggagacatcaacccctgagcaagaggtgcaaagcgttaggtacc 

gggcccgatgtacaggcccaaaagaaggacgggcagagccaggcacccaggctgtataccggattccctgggccctaacc 

cgtctctgtgccacatacctacttccctcctcagccacacctctggatggagacaccggggccctgggcaccagggagga 

gagcagtggaggaggcagggcctcagggcggggcagcaggggaggagcctccccaggaactgactgggtccagggcttgg 



FIGURE 48A (CONTINUED) 



agctgctctctgcagttgtgtgggctgnagagtggagggccatccccccccaccccagccccagctcccaagcctctgga 
gtcaaagcccgggccagctccaccaccgtcagagccaccttggcctgttgtttagagggccttagccagctcttcacccc 
caactctgactagggatgtgcgaaatctcacctgggaggcagaacccccgggtatctcaaattcccctttcagccaggcg 



ggcacactcgaagcaggaaagcagaaaggcatccgagcaggaccccgcagcttgaggacatccggctggtggctgcaccc 
1 1 ac 1 1 aca 1 1 cccc t cc 1 1 c t c t c t cccagCGGAGCCACACTCCCTTCTGCACCTGAGCCGCCTGTGTGTGCGCCACAA 
^^^^^r-r„r-^r-_/-r-af-./^'pr:T,-'7«<-:i-rrTr:rrrTTGCCCCCTGCCATGAAGCGCTACCTGCTCTACCAGTGAg 




CCCCgcgacact:«n,<iya*.i-y — ---^ ^^^^ 

ccagccgctgaaagccggcgaggctgagcccccaccccaacccaaqctctgcggaaatcaacagccccagagccactcgg 
agqoaqgaagaaagggaaccagcgtccaaggccacgacagcccgctacgcaaaacattttttcaagcaaaaacagtaaga 
gacgccgccacagaaacccgcccccgcctccccccccctcttgcacaaatgatcacctatacagccgccccaaaaaggaa 
gaccacccgggcaagcccagcgaaggcagacaaaccacaagacctagtgccaggttcacccccccacatgggtggttcac 
atacacagcacagaggcacgggcaccatgggagagggcagcactcctgccctctgaggggatcttggcctcacggtgtaa 
gaagggagaggatggtttctcttctgccctcactagggcccagggaacccaggagcaaatcccaccacgccttccatctc 
tcagccaaggagaagccaccttagtgacgtttagttccaaccatcacagcaagtggagaagggattggcctggtcccaac 
cactacagggtgaagatacaaacagtaaaggaagacacagtttggatgaggccacaggaaggagcagatgacaccatcag 
aagcatatgcagggaaagggcagctactgggctcccgggccgcttagtccctggcttggcaggaagggcagggaagatgg 
atggggctcattgtttggcactgatgacgtccacgaacccgggcttgagggaagcaccacccacaaggaagccatccaca 
tcaggccggccggccagccccccgcaggccgccccagtcacagagcctgggaagggagcagaacaagggcttggtcaaga 
acgggacgagcctgccccatccccacccccatgtccgagggctcagtctagtccccagcccaccccacctcagccgggaa 
ccaaagccacccacctccataaacgatacgggcgctctgagccaccgcatcagagacgctggacttcagccatcctcgga 
gcttctcgtgtacttcctgggcctagaacaagaagctggcctaagtaagacctttcctgcctctctaagaggaaaaatca 
ctggcaccagtggacacttagtgtggtttctgaccgagtcagagtaccagggctctgatccaagccaggccccggactgg 
aLgcccccggacaagccaccgcccccgggcccaaggcccccgcgtctttgaaataaggggttgccccacgcgggccgcgc 
ctgtccaaacctattgaggcaggctgggatgagggcagggcccctgggcccggttacctgttggggtgttgcagtcttgc 
cagtaccaatggcccacacaggctcataggccaggacgaccccgccccagtccttcacgttatctgcagggcagagatac 
agacggagggaagggcgaacaagaaagagccctccagccaggttctccggagcacgaagaacggcggcccaccgcccccc 

agtggacatj^ggggg 
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Blot: otFLAG 
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L 

Blot: ciPY 



Kpn I 8629 
Apa 1 8623.1 

Xbo\ 8614, 
Xba I 8549, 
Sea I 84271 



Nco 1 783\ 
Sph I 7800. 



Sea I 433 



loxP 
PGKpA 



Amp 



Stu I 7041 
Spe I 6994,\ 
Sph 1 6790\ 



Apa ! 6731 
Hind III 67J7 
Sal I 6702* 
. Nhe I 6683 
EcoR I 6588 
Hind III 656J^ 
EcoR I 655^ 
Sma I 6543 



PGK prorrtoler 
loxP 

Bglobin pA 



pBgalpAloxneo 

9258 bp 



M I 2218 
^Sac I 2202 
.Sac II 2210 
3d I 2226 
\Sp8 I 2232 
3amH I 2246 



Aat II 2859 



EcoR I 6006^ 
Apa I 691 2" 

Sea I 5662 
Nco 1 5501 

Xba I 5364 



Bgal 



Sac i 4176 
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Stu ! 7041 
Spo I 6994, 
Sph I 6790^ 
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_Apa I 5912 
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Nco I 5601 

' Xba I 5364 



Sac I 4176 
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IN THK milTED STATKS PATENT AMD TRADEMARK OFFICE 
Applicants: Douglas J. Hilton et al . Docket: 1097 6 
Sezrlal No» : to be assigned Dated: October 31, 1997 

Filed: concurrently lierewith. 

For: THERAPEUTIC AND DIAGNOSTIC AGENTS 



Assistant Commissioner for Patents 
Washington , DC 2 0231 



CIiAIM OF PRIORITY 

Sir : 

Applicants in the above -identified application hereby 
claim the right of priority in connection with Title 35 U.S.C. 
§ 119. In due course. Applicants will submit a certified copy 
of Australian Application No. P03384/96 filed November 1, 1996, 
and Australian Application No. P05117/97 filed February 14, 
19 97 , in support thereof . 

Respectfully ^^.^"SSmi tted. 
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(516) 742-4343 
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Registration No. 19,827 
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"Express Mail " mailing label number : 
Date of Deposit : October 31 , 1997 



EM422106551US 



I hereby certify that this correspondence is being 
deposited with the United States Postal Service "Express Mail 
Post Office to Addressee" service lender 37 C . F . R 1 . 10 on the 
date indicated above and is address\e\i to the Ass/istajnt 
Commissioner for Patents , Warshington 1/ pC 2 02 3 1 ^ 



Dated: October 31, 1997 




Mishelle Spina 
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